From: Salil Mehta <salil.mehta@huawei.com>
To: Andrew Jones <drjones@redhat.com>
Cc: "peter.maydell@linaro.org" <peter.maydell@linaro.org>,
"mehta.salil.lnk@gmail.com" <mehta.salil.lnk@gmail.com>,
"gshan@redhat.com" <gshan@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"mst@redhat.com" <mst@redhat.com>,
"jiakernel2@gmail.com" <jiakernel2@gmail.com>,
"maz@kernel.org" <maz@kernel.org>,
"david@redhat.com" <david@redhat.com>,
"richard.henderson@linaro.org" <richard.henderson@linaro.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Linuxarm <linuxarm@huawei.com>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"will@kernel.org" <will@kernel.org>,
"qemu-arm@nongnu.org" <qemu-arm@nongnu.org>,
"james.morse@arm.com" <james.morse@arm.com>,
"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
"sudeep.holla@arm.com" <sudeep.holla@arm.com>,
"imammedo@redhat.com" <imammedo@redhat.com>,
"maran.wilson@oracle.com" <maran.wilson@oracle.com>,
zhukeqian <zhukeqian1@huawei.com>,
"wangxiongfeng \(C\)" <wangxiongfeng2@huawei.com>
Subject: RE: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch
Date: Tue, 23 Jun 2020 09:56:54 +0000 [thread overview]
Message-ID: <ae90783d37454d8b9f5a189098e6bbb7@huawei.com> (raw)
In-Reply-To: <20200623091202.pbcbvwnk3pdvwyyy@kamzik.brq.redhat.com>
> From: Andrew Jones [mailto:drjones@redhat.com]
> Sent: Tuesday, June 23, 2020 10:12 AM
> To: Salil Mehta <salil.mehta@huawei.com>
> Cc: qemu-devel@nongnu.org; qemu-arm@nongnu.org; peter.maydell@linaro.org;
> sudeep.holla@arm.com; gshan@redhat.com; mst@redhat.com; jiakernel2@gmail.com;
> maz@kernel.org; zhukeqian <zhukeqian1@huawei.com>; david@redhat.com;
> richard.henderson@linaro.org; Linuxarm <linuxarm@huawei.com>;
> eric.auger@redhat.com; james.morse@arm.com; catalin.marinas@arm.com;
> imammedo@redhat.com; pbonzini@redhat.com; mehta.salil.lnk@gmail.com;
> maran.wilson@oracle.com; will@kernel.org; wangxiongfeng (C)
> <wangxiongfeng2@huawei.com>
> Subject: Re: [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch
>
> On Sat, Jun 13, 2020 at 10:36:07PM +0100, Salil Mehta wrote:
> > This patch-set introduces the virtual cpu hotplug support for ARMv8
> > architecture in QEMU. Idea is to be able to hotplug and hot-unplug the vcpus
> > while guest VM is running and no reboot is required. This does *not* makes
> any
> > assumption of the physical cpu hotplug availability within the host system
> but
> > rather tries to solve the problem at virtualizer/QEMU layer and by introducing
> > cpu hotplug hooks and event handling within the guest kernel. No changes are
> > required within the host kernel/KVM.
> >
> > Motivation:
> > This allows scaling the guest VM compute capacity on-demand which would be
> > useful for the following example scenarios,
> > 1. Vertical Pod Autoscaling[3][4] in the cloud: Part of the orchestration
> > framework which could adjust resource requests (CPU and Mem requests) for
> > the containers in a pod, based on usage.
> > 2. Pay-as-you-grow Business Model: Infrastructure provider could allocate and
> > restrict the total number of compute resources available to the guest VM
> > according to the SLA(Service Level Agreement). VM owner could request for
> > more compute to be hot-plugged for some cost.
> >
> > Terminology:
> >
> > (*) Present cpus: Total cpus with which guest has/will boot and are available
> > to guest for use and can be onlined. Qemu parameter(-smp)
> > (*) Disabled cpus: Possible cpus which will not be available for the guest
> to
> > use. These can be hotplugged and made present. These can be
> > thought of as un-plugged vcpus. These will be included as
> > part of sizing.
> > (*) Posssible cpus: Total vcpus which could ever exist in VM. This includes
> > booted cpus plus any cpus which could be later plugged.
> > - Qemu parameter(-maxcpus)
> > - Possible vcpus = Present vcpus (+) Disabled vcpus
> >
> >
> > Limitations of ARMv8 Architecture:
> >
> > A. Physical Limitation to CPU Hotplug:
> > 1. ARMv8 architecture does not support the concept of the physical cpu hotplug.
> > The closest thing which is recomended to achieve the cpu hotplug on ARM
> is
> > to bring down power state of the cpu using PSCI.
> > 2. Other ARM components like GIC etc. have not been designed to realize
> > physical cpu hotplug capability as of now.
> >
> > B. Limitations of GIC to Support Virtual CPU Hotplug:
> > 1. GIC requires various resources(related to GICR/redistributor, GICC/cpu
> > interface etc) like memory regions to be fixed at the VM init time and these
> > could not be changed later on after VM has inited.
> > 2. Associations between GICC(GIC cpu interface) and vcpu get fixed at the VM
> > init time and GIC does not allows to change this association once GIC has
> > initialized.
> >
> > C. Known Limitation of the KVM:
> > 1. As of now KVM allows to create VCPUs but does not allows to delete the
> > already created vcpus. QEMU already provides an interface to manage created
> > vcpus at KVM level and then to re-use them.
> > 2. Inconsistency in interpretation of the MPIDR generated by KVM for vcpus
> > vis-a-vis SMT/threads. This does not looks to be compliant to the MPIDR
> > format(SMT is present) as mentioned in the ARMv8 spec. (Please correct my
> > understanding if I am wrong here?)
> >
> >
> > Workaround to the problems mentioned in Section B & C1:
> > 1. We pre-size the GIC with possible vcpus at VM init time
> > 2. Pre-create all possible vcpus at KVM and associate them with GICC
> > 3. Park the unplugged vcpus (similar to x86)
> >
> >
> > (*) For all of above please refer to Marc's suggestion here[1]
> >
> >
> > Overview of the Approach:
> > At the time of machvirt_init() we pre-create all of the possible ARMCPU
> > objects along with the corresponding KVM vcpus at the host. Disabled KVM vcpu
> > (which are *not* "present" vcpus but are part of "possible" vcpu list) are
> > parked at per VM list "kvm_parked_vcpus" after their initialization.
> >
> > We create the ARMCPU objects(but these are not *realized* in QOM sense) even
> > for the disabled vcpus to facilitate the GIC initialization (pre-sized with
> > possible vcpus). After Initialization of the machine is complete we release
> > the ARMCPU Objects for the disabled vcpus. These ARMCPU object shall be
> > re-created at the time when vcpu is hot plugged. This new object is then
> > re-attached with the earlier parked KVM vcpu which also gets unparked. The
> > ARMCPU object gets now "realized" in QEMU, which means creation of the
> > corresponding threads, pre_plug/plug phases, and event notification to the
> > guest using ACPI GED etc. Similarly, hot-unplug leg will lead to the
> > "unrealization" of the vcpus and will lead to similar ACPI GED events to the
> > guest for unplug and cleanup and eventually ARMCPU object shall be released
> and
> > KVM vcpus shall be parked again.
> >
> > During machine init, ACPI MADT Table is sized with *possible* vcpus GICC
> > entries. The unplugged/disabled vcpus are presented as MADT GICC DISABLED
> > entries to the guest. This means the guest will have its resources pre-sized
> > with possible vcpus(=present+disabled)
> >
> > Other approaches to deal with ARMCPU object release(after machine init):
> > 1. The ARMCPU objects for the disabled vcpus are released in context to the
> > virt_machine_done() notifier(approach adopted in this patch-set).
> > 2. Defer the release of current ARMCPU object till the new vcpu object is
> > hot plugged.
> > 3. Never release and keep on reusing them and release once at VM exit. This
> > solves many problems with above 2 approaches but requires change in the
> way
> > qdev_device_add() fetches/creates the ARMCPU object for the new vcpus being
> > hotplugged. For the arm cpu hotplug case we need to figure out way how to
> > get access to old object and use it to "re-realize" instead of the new
> > ARMCPU object.
> >
> > Concerns/Questions:
> > 1. In ARM arch a cpu is uniquely represented in hierarchy using various
> > affinity levels which could represent thread, core, cluster, package. This
> > is generally represented by a value in MPIDR register as per the format
> > mentioned in specification. Now, the way MPIDR value is derived for vcpus
> is
> > done using vcpu-index. The concept of thread is not quite as same and rather
> > gets lost in the derivation of MPIDR for vcpus.
> > 2. The topology info used to specify the vcpu while hot-plugging might not
> > match with the MPIDR value given back by the KVM for the vcpu at the time
> of
> > init. Concept of SMT bit in MPIDR gets lost as per the derivation being
> done
> > in the KVM. Hence, concept of thread-id, core-id, socket-id if used as a
> > topology info to derive MPIDR value as per ARM specification will not match
> > with MPIDR actually assigned by the KVM?
> > Perhaps need to carry forward work of Andrew? please check here[2]
> > 3. Further if this info is supplied to the guest using PPTT(once introduced
> in
> > QEMU) or even derived using MPIDR shall be inconsistent with the host vcpu.
> > 4. Any possibilities of interrupts(SGI/PPI/LPI/SPI) always remaining in
> > *pending* state for the cpus which have been hot-unplugged? IMHO it looks
> > okay but will need Marc's confirmation on this.
> > 5. If the ARMCPU object is released after the machine init, UEFI could call
> > back virt_update_table() to re-build the ACPI tables which might need an
> > ARMCPU object. Please check the discussion here[5]
> >
> >
> > Commands Used:
> >
> > A. Qemu launch commands to init the machine
> >
> > $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
> > -cpu host -smp cpus=4,maxcpus=6 \
> > -m 300M \
> > -kernel Image \
> > -initrd rootfs.cpio.gz \
> > -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2 acpi=force" \
> > -nographic \
> > -bios QEMU_EFI.fd \
> >
> > B. Hot-(un)plug related commands
> >
> > # Hotplug a host vcpu(accel=kvm)
> > $ device_add host-arm-cpu,id=core4,core-id=4
> >
> > # Hotplug a vcpu(accel=tcg)
> > $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
> >
> > # Delete the vcpu
> > $ device_del core4
> >
> > NOTE: I have not tested the current solution with '-device' interface. The
> use
> > is suggested by Igor here[6]. I will test this in coming times but looks
> > it should work with existing changes.
> >
> >
> > Sample output on guest after boot:
> >
> > $ cat /sys/devices/system/cpu/possible
> > 0-5
> > $ cat /sys/devices/system/cpu/present
> > 0-3
> > $ cat /sys/devices/system/cpu/online
> > 0-1
> > $ cat /sys/devices/system/cpu/offline
> > 2-5
> >
> >
> > Sample output on guest after hotplug of vcpu=4:
> >
> > $ cat /sys/devices/system/cpu/possible
> > 0-5
> > $ cat /sys/devices/system/cpu/present
> > 0-4
> > $ cat /sys/devices/system/cpu/online
> > 0-1,4
> > $ cat /sys/devices/system/cpu/offline
> > 2-3,5
> >
> > Note: vcpu=4 was explicitly 'onlined' after hot-plug
> > $ echo 1 > /sys/devices/system/cpu/cpu4/online
> >
> >
> > Repository:
> > (*) QEMU changes for vcpu hotplug could be cloned from below site,
> > https://github.com/salil-mehta/qemu.git virt-cpuhp-armv8/rfc-v1
> >
> > (*) Guest Kernel changes required to co-work with the QEMU shall be posted
> soon
> > and repo made available at above site.
> >
> >
> > THINGS TO DO:
> > (*) Migration support
> > (*) TCG/Emulation support is not proper right now. Works to a certain extent
> > but is not complete. especially the unrealize part in which there is a
> > overflow of tcg contexts. The last is due to the fact tcg maintains a
> > count on number of context(per thread instance) so as we hotplug the vcpus
> > this counter keeps on incrementing. But during hot-unplug the counter
> is
> > not decremented.
> > (*) Support of hotplug with NUMA is not proper
> > (*) CPU Topology right now is not specified using thread/core/socket but
> > rather flatly indexed using core-id. This needs consideration[2].
> > (*) Do we need PPTT Support for to specify right topology info to guest about
> > hot-plugged or unplugged vcpus?
> > (*) Test cases
> > (*) Docs need to be updated.
> >
> >
>
> Hi Salil,
>
> I realize this is just a preliminary posting and the approach hasn't been
> finalized, but maybe in a future posting we can put a lot of this
> information into a doc patch. I think we'll need good documentation for
> this feature to ensure we get it right and keep in maintained correctly.
Sure, let us do it once we converge on the concept.
Thanks
Salil.
prev parent reply other threads:[~2020-06-23 9:57 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-13 21:36 [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 01/22] arm/cpuhp: Add QMP vcpu params validation support Salil Mehta
2020-06-23 8:46 ` Andrew Jones
2020-06-23 9:40 ` Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 02/22] arm/cpuhp: Add new ARMCPU core-id property Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 03/22] arm/cpuhp: Add common cpu utility for possible vcpus Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 04/22] arm/cpuhp: Machine init time change common to vcpu {cold|hot}-plug Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 05/22] arm/cpuhp: Pre-create disabled possible vcpus @machine init Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 06/22] arm/cpuhp: Changes to pre-size GIC with " Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 07/22] arm/cpuhp: Init PMU at host for all possible vcpus Salil Mehta
2020-06-23 9:00 ` Andrew Jones
2020-06-23 9:52 ` Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 08/22] arm/cpuhp: Enable ACPI support for vcpu hotplug Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 09/22] arm/cpuhp: Init GED framework with cpu hotplug events Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 10/22] arm/cpuhp: Update CPUs AML with cpu-(ctrl)dev change Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 11/22] arm/cpuhp: Update GED _EVT method AML with cpu scan Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 12/22] arm/cpuhp: MADT Tbl change to size the guest with possible vcpus Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 13/22] arm/cpuhp: Add ACPI _MAT entry for Processor object Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 14/22] arm/cpuhp: Release objects for *disabled* possible vcpus after init Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 15/22] arm/cpuhp: Update ACPI GED framework to support vcpu hotplug Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 16/22] arm/cpuhp: Add/update basic hot-(un)plug framework Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 17/22] arm/cpuhp: Changes to (un)wire GICC<->VCPU IRQs during hot-(un)plug Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 18/22] arm/cpuhp: Changes to update GIC with vcpu hot-plug notification Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 19/22] arm/cpuhp: Changes required to (re)init the vcpu register info Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 20/22] arm/cpuhp: Update the guest(via GED) about cpu hot-(un)plug events Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 21/22] arm/cpuhp: Changes required for reset and to support next boot Salil Mehta
2020-06-13 21:36 ` [PATCH RFC 22/22] arm/cpuhp: Add support of *unrealize* ARMCPU during vcpu hot-unplug Salil Mehta
2020-06-13 22:24 ` [PATCH RFC 00/22] Support of Virtual CPU Hotplug for ARMv8 Arch no-reply
2020-06-13 22:26 ` no-reply
2020-06-14 11:54 ` Marc Zyngier
2020-06-15 10:19 ` Salil Mehta
2020-06-23 9:12 ` Andrew Jones
2020-06-23 9:56 ` Salil Mehta [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae90783d37454d8b9f5a189098e6bbb7@huawei.com \
--to=salil.mehta@huawei.com \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=drjones@redhat.com \
--cc=eric.auger@redhat.com \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=james.morse@arm.com \
--cc=jiakernel2@gmail.com \
--cc=linuxarm@huawei.com \
--cc=maran.wilson@oracle.com \
--cc=maz@kernel.org \
--cc=mehta.salil.lnk@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=sudeep.holla@arm.com \
--cc=wangxiongfeng2@huawei.com \
--cc=will@kernel.org \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).