From: Salil Mehta via <qemu-devel@nongnu.org>
To: "Salil Mehta" <salil.mehta@huawei.com>,
"Alex Bennée" <alex.bennee@linaro.org>
Cc: Gustavo Romero <gustavo.romero@linaro.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"mst@redhat.com" <mst@redhat.com>,
"maz@kernel.org" <maz@kernel.org>,
"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
"lpieralisi@kernel.org" <lpieralisi@kernel.org>,
"peter.maydell@linaro.org" <peter.maydell@linaro.org>,
"richard.henderson@linaro.org" <richard.henderson@linaro.org>,
"imammedo@redhat.com" <imammedo@redhat.com>,
"andrew.jones@linux.dev" <andrew.jones@linux.dev>,
"david@redhat.com" <david@redhat.com>,
"philmd@linaro.org" <philmd@linaro.org>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"will@kernel.org" <will@kernel.org>,
"ardb@kernel.org" <ardb@kernel.org>,
"oliver.upton@linux.dev" <oliver.upton@linux.dev>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"gshan@redhat.com" <gshan@redhat.com>,
"rafael@kernel.org" <rafael@kernel.org>,
"borntraeger@linux.ibm.com" <borntraeger@linux.ibm.com>,
"npiggin@gmail.com" <npiggin@gmail.com>,
"harshpb@linux.ibm.com" <harshpb@linux.ibm.com>,
"linux@armlinux.org.uk" <linux@armlinux.org.uk>,
"darren@os.amperecomputing.com" <darren@os.amperecomputing.com>,
"ilkka@os.amperecomputing.com" <ilkka@os.amperecomputing.com>,
"vishnu@os.amperecomputing.com" <vishnu@os.amperecomputing.com>,
"karl.heubaum@oracle.com" <karl.heubaum@oracle.com>,
"miguel.luis@oracle.com" <miguel.luis@oracle.com>,
"salil.mehta@opnsrc.net" <salil.mehta@opnsrc.net>,
zhukeqian <zhukeqian1@huawei.com>,
"wangxiongfeng (C)" <wangxiongfeng2@huawei.com>,
"wangyanan (Y)" <wangyanan55@huawei.com>,
"jiakernel2@gmail.com" <jiakernel2@gmail.com>,
"maobibo@loongson.cn" <maobibo@loongson.cn>,
"lixianglai@loongson.cn" <lixianglai@loongson.cn>,
"shahuang@redhat.com" <shahuang@redhat.com>,
"zhao1.liu@intel.com" <zhao1.liu@intel.com>,
Linuxarm <linuxarm@huawei.com>
Subject: RE: [PATCH RFC V3 00/29] Support of Virtual CPU Hotplug for ARMv8 Arch
Date: Fri, 6 Sep 2024 15:06:38 +0000 [thread overview]
Message-ID: <389fd6c14eb744f79e6f5d34fa30c9c7@huawei.com> (raw)
In-Reply-To: <923008d9b65d45eba4e4ae19fe62f79c@huawei.com>
Hi Alex,
> From: qemu-arm-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> arm-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Salil
> Mehta via
> Sent: Wednesday, September 4, 2024 5:00 PM
> To: Alex Bennée <alex.bennee@linaro.org>
>
> Hi Alex,
>
> > From: Alex Bennée <alex.bennee@linaro.org>
> > Sent: Wednesday, September 4, 2024 4:46 PM
> > To: Salil Mehta <salil.mehta@huawei.com>
> >
> > Salil Mehta <salil.mehta@huawei.com> writes:
> >
> > > Hi Alex,
> > >
> > >> -----Original Message-----
> > >> From: Alex Bennée <alex.bennee@linaro.org> >> Sent: Thursday,
> > August 29, 2024 11:00 AM >> To: Gustavo Romero
> > <gustavo.romero@linaro.org> >> >> Gustavo Romero
> > <gustavo.romero@linaro.org> writes:
> > >>
> > >> > Hi Salil,
> > >> >
> > >> > On 6/13/24 8:36 PM, Salil Mehta via wrote:
> > >> <snip>
> > >> >> (VI) Commands Used
> > >> >> ==================
> > >> >> A. Qemu launch commands to init the machine:
> > >> >> $ qemu-system-aarch64 --enable-kvm -machine virt,gic-version=3 \
> > >> >> -cpu host -smp cpus=4,maxcpus=6 \
> > >> >> -m 300M \
> > >> >> -kernel Image \
> > >> >> -initrd rootfs.cpio.gz \
> > >> >> -append "console=ttyAMA0 root=/dev/ram rdinit=/init maxcpus=2
> > >> acpi=force" \
> > >> >> -nographic \
> > >> >> -bios QEMU_EFI.fd \
> > >> >> B. Hot-(un)plug related commands:
> > >> >> # Hotplug a host vCPU (accel=kvm):
> > >> >> $ device_add host-arm-cpu,id=core4,core-id=4
> > >> >> # Hotplug a vCPU (accel=tcg):
> > >> >> $ device_add cortex-a57-arm-cpu,id=core4,core-id=4
> > >> >
> > >> > Since support for hotplug is disabled on TCG, remove these two
> > >> lines > in v4 cover letter?
> > >>
> > >> Why is it disabled for TCG? We should aim for TCG being as close
> > to >> KVM as possible for developers even if it is not a production solution.
> > >
> > > Agreed In principle. Yes, that would be of help.
> > >
> > >
> > > Context why it was disabled although most code to support TCG exist:
> > >
> > > I had reported a crash in the RFC V1 (June 2020) about TCGContext
> > > counter overflow assertion during repeated hot(un)plug operation.
> > > Miguel from Oracle was able to reproduce this problem last year in
> > Feb > and also suggested a fix but he later found out in his testing
> > that there was a problem during migration.
> > >
> > > RFC V1 June 2020:
> > > https://lore.kernel.org/qemu-devel/20200613213629.21984-1-
> > salil.mehta@
> > > huawei.com/
> > > Scroll to below:
> > > [...]
> > > THINGS TO DO:
> > > (*) Migration support
> > > (*) TCG/Emulation support is not proper right now. Works to a certain extent
> > > but is not complete. especially the unrealize part in which there is a
> > > overflow of tcg contexts. The last is due to the fact tcg maintains a
> > > count on number of context(per thread instance) so as we hotplug the vcpus
> > > this counter keeps on incrementing. But during hot-unplug the counter is
> > > not decremented.
> >
> > Right so the translation cache is segmented by vCPU to support
> > parallel JIT operations. The easiest solution would be to ensure we
> > dimension for the maximum number of vCPUs, which it should already, see
> > tcg_init_machine():
> >
> > unsigned max_cpus = ms->smp.max_cpus;
> > ...
> > tcg_init(s->tb_size * MiB, s->splitwx_enabled, max_cpus);
>
>
> Agreed. We have done that and have a patch for that as well. But it is still a
> work-in-progress and I've lost context a bit.
>
> https://github.com/salil-
> mehta/qemu/commit/107cf5ca7cf3716bc0f8c68e98e1da3939f449ce
>
> For now, I've very quickly tried to enable and run the TCG to gain back the
> context.
> I've now hit a different problem during TCG vCPU unrealization phase, while
> pthread_join() waits on halt condition variable for MTTCG vCPU thread to
> exit, there is a crash somewhere. Look like some race condition. Will dig this
> further.
It appears that there was a race condition occurring between destruction of the
CPU Address Space and the delayed processing of the tcg_commit_cpu() function.
The latter is primarily responsible for:
1. Updating of memory dispatch pointer
2. Performing the tlb_flush() operation.
This process involves calling the CPU Address Space Memory listener's
tcg_commit(), which queues this work item for the CPU to be executed by
the vCPU at the earliest opportunity. During ARM vCPU unrealization, we
were destroying Address Space first, followed by calling cpu_remove_sync().
This resulted vCPU thread being licked out of IO wait state, leading to
processing of the vCPU work queue items. Since the CPU Address Space
had already been destroyed, this caused the Segmentation fault.
I've resolved this issue by delaying the destruction of CPU Address Space
until the cpu_remove_sync() operation has been completed, but before
the parent is unrealized. This has resolved the crash. The vCPU Hotplug
operation seems to be working on TCG now. I still need to test the migration
process, which I plan to do in the next couple of days. Please have a look
at below patch and the repository.
https://github.com/salil-mehta/qemu/commit/9fbb8ecbc61c6405db342cc243b2be17b1c97e03
https://github.com/salil-mehta/qemu/commit/1900893449c1b6a10e1534635f29bfb545b825d0
Please check the below branch:
https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v4-rc5
Best regards
Salil.
> > > @ Feb 2023, [Linaro-open-discussions] Re: Qemu TCG support for >
> > virtual-cpuhotplug/online-policy > >
> > https://op-lists.linaro.org/archives/list/linaro-open-discussions@op-l
> > > ists.linaro.org/message/GMDFTEZE6WUUI7LZAYOWLXFHAPXLCND5/
> > >
> > > Last status reported by Miguel was that there was problem with the
> > TCG > and he intended to fix this. He was on paternity leave so I
> > will try to gather the exact status of the TCG today.
> > >
> > > Thanks
> > > Salil
> > >
> > >
> > >>
> > >> --
> > >> Alex Bennée
> > >> Virtualisation Tech Lead @ Linaro
> >
> > --
> > Alex Bennée
> > Virtualisation Tech Lead @ Linaro
next prev parent reply other threads:[~2024-09-06 15:07 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-13 23:36 [PATCH RFC V3 00/29] Support of Virtual CPU Hotplug for ARMv8 Arch Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 01/29] arm/virt, target/arm: Add new ARMCPU {socket, cluster, core, thread}-id property Salil Mehta via
2024-08-12 4:35 ` [PATCH RFC V3 01/29] arm/virt,target/arm: Add new ARMCPU {socket,cluster,core,thread}-id property Gavin Shan
2024-08-12 8:15 ` Igor Mammedov
2024-08-13 0:31 ` Gavin Shan
2024-08-19 12:07 ` Salil Mehta via
2024-08-19 11:53 ` Salil Mehta via
2024-09-04 14:42 ` zhao1.liu
2024-09-04 17:37 ` Salil Mehta via
2024-09-09 15:28 ` Zhao Liu
2024-09-10 11:01 ` Salil Mehta via
2024-09-11 11:35 ` Jonathan Cameron via
2024-09-11 12:25 ` Salil Mehta
2024-06-13 23:36 ` [PATCH RFC V3 02/29] cpu-common: Add common CPU utility for possible vCPUs Salil Mehta via
2024-07-04 3:12 ` Nicholas Piggin
2024-08-12 4:59 ` Gavin Shan
2024-08-12 5:41 ` 回复: " liu ping
2024-06-13 23:36 ` [PATCH RFC V3 03/29] hw/arm/virt: Limit number of possible vCPUs for unsupported Accel or GIC Type Salil Mehta via
2024-08-12 5:09 ` Gavin Shan
2024-06-13 23:36 ` [PATCH RFC V3 04/29] hw/arm/virt: Move setting of common CPU properties in a function Salil Mehta via
2024-08-12 5:19 ` Gavin Shan
2024-06-13 23:36 ` [PATCH RFC V3 05/29] arm/virt, target/arm: Machine init time change common to vCPU {cold|hot}-plug Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 06/29] arm/virt, kvm: Pre-create disabled possible vCPUs @machine init Salil Mehta via
2024-08-13 0:58 ` [PATCH RFC V3 06/29] arm/virt,kvm: " Gavin Shan
2024-08-19 5:31 ` Gavin Shan
2024-08-19 13:06 ` Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 07/29] arm/virt, gicv3: Changes to pre-size GIC with possible vcpus " Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 08/29] arm/virt: Init PMU at host for all possible vcpus Salil Mehta via
2024-07-04 3:07 ` Nicholas Piggin
2024-07-04 12:03 ` Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 09/29] arm/acpi: Enable ACPI support for vcpu hotplug Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 10/29] arm/virt: Add cpu hotplug events to GED during creation Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 11/29] arm/virt: Create GED dev before *disabled* CPU Objs are destroyed Salil Mehta via
2024-08-13 1:04 ` Gavin Shan
2024-08-19 12:10 ` Salil Mehta via
2024-08-20 0:22 ` Gavin Shan
2024-08-20 17:10 ` Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 12/29] arm/virt/acpi: Build CPUs AML with CPU Hotplug support Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 13/29] arm/virt: Make ARM vCPU *present* status ACPI *persistent* Salil Mehta via
2024-07-04 2:49 ` Nicholas Piggin
2024-07-04 11:23 ` Salil Mehta via
2024-07-05 0:08 ` Nicholas Piggin
2024-06-13 23:36 ` [PATCH RFC V3 14/29] hw/acpi: ACPI/AML Changes to reflect the correct _STA.{PRES, ENA} Bits to Guest Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 15/29] hw/arm: MADT Tbl change to size the guest with possible vCPUs Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 16/29] hw/acpi: Make _MAT method optional Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 17/29] arm/virt: Release objects for *disabled* possible vCPUs after init Salil Mehta via
2024-08-13 1:17 ` Gavin Shan
2024-08-19 12:21 ` Salil Mehta via
2024-08-20 0:05 ` Gavin Shan
2024-08-20 16:40 ` Salil Mehta via
2024-08-21 6:25 ` Gavin Shan
2024-08-21 10:23 ` Salil Mehta via
2024-08-21 13:32 ` Gavin Shan
2024-08-22 10:58 ` Salil Mehta via
2024-08-23 10:52 ` Gavin Shan
2024-08-23 13:17 ` Salil Mehta via
2024-08-24 10:03 ` Gavin Shan
2024-06-13 23:36 ` [PATCH RFC V3 18/29] arm/virt: Add/update basic hot-(un)plug framework Salil Mehta via
2024-08-13 1:21 ` Gavin Shan
2024-08-19 12:30 ` Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 19/29] arm/virt: Changes to (un)wire GICC<->vCPU IRQs during hot-(un)plug Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 20/29] hw/arm, gicv3: Changes to update GIC with vCPU hot-plug notification Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 21/29] hw/intc/arm-gicv3*: Changes required to (re)init the vCPU register info Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 22/29] arm/virt: Update the guest(via GED) about CPU hot-(un)plug events Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 23/29] hw/arm: Changes required for reset and to support next boot Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 24/29] target/arm: Add support of *unrealize* ARMCPU during vCPU Hot-unplug Salil Mehta via
2024-08-16 15:37 ` Alex Bennée
2024-08-16 15:50 ` Peter Maydell
2024-08-16 17:00 ` Peter Maydell
2024-08-19 12:59 ` Salil Mehta via
2024-08-19 13:43 ` Peter Maydell
2024-08-19 12:58 ` Salil Mehta via
2024-08-19 13:46 ` Peter Maydell
2024-08-20 15:34 ` Salil Mehta via
2024-08-19 12:35 ` Salil Mehta via
2024-08-28 20:23 ` Gustavo Romero
2024-09-04 13:53 ` Salil Mehta via
2024-06-13 23:36 ` [PATCH RFC V3 25/29] target/arm/kvm: Write CPU state back to KVM on reset Salil Mehta via
2024-07-04 3:27 ` Nicholas Piggin
2024-07-04 12:27 ` Salil Mehta via
2024-06-14 0:15 ` [PATCH RFC V3 26/29] target/arm/kvm, tcg: Register/Handle SMCCC hypercall exits to VMM/Qemu Salil Mehta via
2024-06-14 0:18 ` [PATCH RFC V3 27/29] hw/arm: Support hotplug capability check using _OSC method Salil Mehta via
2024-06-14 0:19 ` [PATCH RFC V3 28/29] tcg/mttcg: enable threads to unregister in tcg_ctxs[] Salil Mehta via
2024-06-14 0:20 ` [PATCH RFC V3 29/29] hw/arm/virt: Expose cold-booted CPUs as MADT GICC Enabled Salil Mehta via
2024-06-26 9:53 ` [PATCH RFC V3 00/29] Support of Virtual CPU Hotplug for ARMv8 Arch Vishnu Pajjuri
2024-06-26 18:01 ` Salil Mehta via
2024-07-01 11:38 ` Miguel Luis
2024-07-01 16:30 ` Salil Mehta via
2024-08-07 9:53 ` Gavin Shan
2024-08-07 13:27 ` Salil Mehta via
2024-08-07 16:07 ` Salil Mehta via
2024-08-08 5:00 ` Gavin Shan
2024-08-07 23:41 ` Gavin Shan
2024-08-07 23:48 ` Salil Mehta via
2024-08-08 0:29 ` Gavin Shan
2024-08-08 4:15 ` Gavin Shan
2024-08-08 8:39 ` Salil Mehta via
2024-08-08 8:36 ` Salil Mehta via
2024-08-28 20:35 ` Gustavo Romero
2024-08-29 9:59 ` Alex Bennée
2024-09-04 14:24 ` Salil Mehta via
2024-09-04 15:45 ` Alex Bennée
2024-09-04 15:59 ` Salil Mehta via
2024-09-06 15:06 ` Salil Mehta via [this message]
2024-09-04 14:03 ` Salil Mehta via
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=389fd6c14eb744f79e6f5d34fa30c9c7@huawei.com \
--to=qemu-devel@nongnu.org \
--cc=alex.bennee@linaro.org \
--cc=andrew.jones@linux.dev \
--cc=ardb@kernel.org \
--cc=borntraeger@linux.ibm.com \
--cc=darren@os.amperecomputing.com \
--cc=david@redhat.com \
--cc=eric.auger@redhat.com \
--cc=gshan@redhat.com \
--cc=gustavo.romero@linaro.org \
--cc=harshpb@linux.ibm.com \
--cc=ilkka@os.amperecomputing.com \
--cc=imammedo@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=jiakernel2@gmail.com \
--cc=jonathan.cameron@huawei.com \
--cc=karl.heubaum@oracle.com \
--cc=linux@armlinux.org.uk \
--cc=linuxarm@huawei.com \
--cc=lixianglai@loongson.cn \
--cc=lpieralisi@kernel.org \
--cc=maobibo@loongson.cn \
--cc=maz@kernel.org \
--cc=miguel.luis@oracle.com \
--cc=mst@redhat.com \
--cc=npiggin@gmail.com \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=rafael@kernel.org \
--cc=richard.henderson@linaro.org \
--cc=salil.mehta@huawei.com \
--cc=salil.mehta@opnsrc.net \
--cc=shahuang@redhat.com \
--cc=vishnu@os.amperecomputing.com \
--cc=wangxiongfeng2@huawei.com \
--cc=wangyanan55@huawei.com \
--cc=will@kernel.org \
--cc=zhao1.liu@intel.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).