RE: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Salil Mehta via <qemu-devel@nongnu.org>
To: Richard Henderson <richard.henderson@linaro.org>,
	"salil.mehta@opnsrc.net" <salil.mehta@opnsrc.net>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	 "qemu-arm@nongnu.org" <qemu-arm@nongnu.org>,
	"mst@redhat.com" <mst@redhat.com>
Subject: RE: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc
Date: Thu, 2 Oct 2025 12:27:26 +0000	[thread overview]
Message-ID: <bc780e0c68fa44da975d8f6fcdb38cd7@huawei.com> (raw)
In-Reply-To: <60631203-626f-4628-8a40-226bd45d1c8e@linaro.org>

Hi Richard,

Thanks for the reply. Please find my response inline.

Cheers.

> From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
> devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Richard
> Henderson
> Sent: Wednesday, October 1, 2025 10:34 PM
> To: salil.mehta@opnsrc.net; qemu-devel@nongnu.org; qemu-
> arm@nongnu.org; mst@redhat.com
> Subject: Re: [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs
> on first region alloc
> 
> On 9/30/25 18:01, salil.mehta@opnsrc.net wrote:
> > From: Salil Mehta <salil.mehta@huawei.com>
> >
> > The TCG code cache is split into regions shared by vCPUs under MTTCG.
> > For cold-boot (early realized) vCPUs, regions are sized/allocated during
> bring-up.
> > However, when a vCPU is *lazy_realized* (administratively "disabled"
> > at boot and realized later on demand), its TCGContext may fail the
> > very first code region allocation if the shared TB cache is saturated
> > by already-running vCPUs.
> >
> > Flushing the TB cache is the right remediation, but `tb_flush()` must
> > be performed from the safe execution context
> (cpu_exec_loop()/tb_gen_code()).
> > This patch wires a deferred flush:
> >
> >    * In `tcg_region_initial_alloc__locked()`, treat an initial allocation
> >      failure for a lazily realized vCPU as non-fatal: set `s->tbflush_pend`
> >      and return.
> >
> >    * In `tcg_tb_alloc()`, if `s->tbflush_pend` is observed, clear it and
> >      return NULL so the caller performs a synchronous `tb_flush()` and then
> >      retries allocation.
> >
> > This avoids hangs observed when a newly realized vCPU cannot obtain
> > its first region under TB-cache pressure, while keeping the flush at a safe
> point.
> >
> > No change for cold-boot vCPUs and when accel ops is KVM.
> >
> > In earlier series, this patch was with below named,
> > 'tcg: Update tcg_register_thread() leg to handle region alloc for hotplugged
> vCPU'
> 
> 
> I don't see why you need two different booleans for this.

I can see your point. Maybe I can move `s->tbflush_pend`  to 'CPUState' instead? 

> It seems to me that you could create the cpu in a state for which the first call
> to
> tcg_tb_alloc() sees highwater state, and everything after that happens per
> usual allocating a new region, and possibly flushing the full buffer.

Correct. but with a distinction that highwater state is relevant to a TCGContext
and the regions are allocated from a common pool 'Code Generation Buffer'.
'code_gen_highwater' is use to detect whether current context needs more
region allocation for the dynamic translation to continue. This is a different
condition than what we are encountering; which is the worst case condition
that the entire code generation buffer is saturated and cannot even allocate
a single free TCG region successfully. In such a case, we do not have any option
than to flush the entire buffer and reallocate the regions to all the threads.
A rebalancing act to accommodate a new vCPU - which is expensive but the
good thing is this does not happens every time and is a worst case condition
i.e. when a system is under tremendous stress and is running out of resources. 

We are avoiding this crash:

ERROR:../tcg/region.c:396:tcg_region_initial_alloc__locked: assertion failed: (!err)
Bail out! ERROR:../tcg/region.c:396:tcg_region_initial_alloc__locked: assertion failed: (!err)
./run-qemu.sh: line 8: 255346 Aborted                 
(core dumped) ./qemu/build/qemu-system-aarch64 -M virt,accel=tcg

Dump is here:

Thread 65 "qemu-system-aar" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff48ff9640 (LWP 633577)]
0x00007ffff782f98c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff782f98c in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x00007ffff77e2646 in raise () at /lib64/libc.so.6
#2  0x00007ffff77cc7f3 in abort () at /lib64/libc.so.6
#3  0x00007ffff7c21d6c in g_assertion_message_expr.cold () at /lib64/libglib-2.0.so.0
#4  0x00007ffff7c7ce2f in g_assertion_message_expr () at /lib64/libglib-2.0.so.0
#5  0x00005555561cf359 in tcg_region_initial_alloc__locked (s=0x7fff10000b60) at ../tcg/region.c:396
#6  0x00005555561cf3ab in tcg_region_initial_alloc (s=0x7fff10000b60) at ../tcg/region.c:402
#7  0x00005555561da83c in tcg_register_thread () at ../tcg/tcg.c:820
#8  0x00005555561a97bb in mttcg_cpu_thread_fn (arg=0x555557e0c2b0) at ../accel/tcg/tcg-accel-ops-mttcg.c:77
#9  0x00005555564f18ab in qemu_thread_start (args=0x5555582e2bc0) at ../util/qemu-thread-posix.c:541
#10 0x00007ffff782dc12 in start_thread () at /lib64/libc.so.6
#11 0x00007ffff78b2cc0 in clone3 () at /lib64/libc.so.6
(gdb)

> 
> What is the testcase for this?

As mentioned, tackling a worst case when 'code generation buffer' runs out
of space totally. We need a better mitigation plan that to simply assert().

Can be easily reproducible by decreasing the 'tb_size'  and increasing the 
number of vCPUs, and having larger programs running simultaneously.
I was able to reproduce it with only 6 vCPUs and with 'tb_size=10'.
Booting was dead slow but with a single vCPU hotplug action we can
 reproduce it.

RFC V6 has TCG broken for some other reason and I'm trying to fix it.
But if you wish you can try this on RFC 5 which has greater chances of
this happening as it actually uses vCPU hotplug approach i.e. threads
can be created and deleted.

https://github.com/salil-mehta/qemu/commits/virt-cpuhp-armv8/rfc-v5/

With RFC V6 this condition is likely to happen only once during delayed
spawning of the vCPU thread of a VCPU being lazily realized. We do not
delete the spawned thread.

Many thanks!

Best regards
Salil.

> 
> 
> r~

next prev parent reply	other threads:[~2025-10-02 12:33 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-01  1:01 [PATCH RFC V6 00/24] Support of Virtual CPU Hotplug-like Feature for ARMv8+ Arch salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 01/24] hw/core: Introduce administrative power-state property and its accessors salil.mehta
2025-10-09 10:48   ` Miguel Luis
2025-10-01  1:01 ` [PATCH RFC V6 02/24] hw/core, qemu-options.hx: Introduce 'disabledcpus' SMP parameter salil.mehta
2025-10-09 11:28   ` Miguel Luis
2025-10-09 13:17     ` Igor Mammedov
2025-10-09 11:51   ` Markus Armbruster
2025-10-01  1:01 ` [PATCH RFC V6 03/24] hw/arm/virt: Clamp 'maxcpus' as-per machine's vCPU deferred online-capability salil.mehta
2025-10-09 12:32   ` Miguel Luis
2025-10-09 13:11     ` Igor Mammedov
2025-10-01  1:01 ` [PATCH RFC V6 04/24] arm/virt, target/arm: Add new ARMCPU {socket, cluster, core, thread}-id property salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 05/24] arm/virt, kvm: Pre-create KVM vCPUs for 'disabled' QOM vCPUs at machine init salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 06/24] arm/virt, gicv3: Pre-size GIC with possible " salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 07/24] arm/gicv3: Refactor CPU interface init for shared TCG/KVM use salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 08/24] arm/virt, gicv3: Guard CPU interface access for admin disabled vCPUs salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 09/24] hw/intc/arm_gicv3_common: Migrate & check 'GICv3CPUState' accessibility mismatch salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 10/24] arm/virt: Init PMU at host for all present vCPUs salil.mehta
2025-10-03 15:02   ` Igor Mammedov
2025-10-01  1:01 ` [PATCH RFC V6 11/24] hw/arm/acpi: MADT change to size the guest with possible vCPUs salil.mehta
2025-10-03 15:09   ` Igor Mammedov
     [not found]     ` <0175e40f70424dd9a29389b8a4f16c42@huawei.com>
2025-10-07 12:20       ` Igor Mammedov
2025-10-10  3:15         ` Salil Mehta
2025-10-01  1:01 ` [PATCH RFC V6 12/24] hw/core: Introduce generic device power-state handler interface salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 13/24] qdev: make admin power state changes trigger platform transitions via ACPI salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 14/24] arm/acpi: Introduce dedicated CPU OSPM interface for ARM-like platforms salil.mehta
2025-10-03 14:58   ` Igor Mammedov
     [not found]     ` <7da6a9c470684754810414f0abd23a62@huawei.com>
2025-10-07 12:06       ` Igor Mammedov
2025-10-10  3:00         ` Salil Mehta
2025-10-01  1:01 ` [PATCH RFC V6 15/24] acpi/ged: Notify OSPM of CPU administrative state changes via GED salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 16/24] arm/virt/acpi: Update ACPI DSDT Tbl to include 'Online-Capable' CPUs AML salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 17/24] hw/arm/virt, acpi/ged: Add PowerStateHandler hooks for runtime CPU state changes salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 18/24] target/arm/kvm, tcg: Handle SMCCC hypercall exits in VMM during PSCI_CPU_{ON, OFF} salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 19/24] target/arm/cpu: Add the Accessor hook to fetch ARM CPU arch-id salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 20/24] target/arm/kvm: Write vCPU's state back to KVM on cold-reset salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 21/24] hw/intc/arm-gicv3-kvm: Pause all vCPUs & cache ICC_CTLR_EL1 for userspace PSCI CPU_ON salil.mehta
2025-10-01  1:01 ` [PATCH RFC V6 22/24] monitor, qdev: Introduce 'device_set' to change admin state of existing devices salil.mehta
2025-10-09  8:55   ` [PATCH RFC V6 22/24] monitor,qdev: " Markus Armbruster
2025-10-09 12:51     ` Igor Mammedov
2025-10-09 14:03       ` Daniel P. Berrangé
2025-10-09 14:55       ` Markus Armbruster
2025-10-09 15:19         ` Peter Maydell
2025-10-10  4:59           ` Markus Armbruster
2025-10-01  1:01 ` [PATCH RFC V6 23/24] monitor, qapi: add 'info cpus-powerstate' and QMP query (Admin + Oper states) salil.mehta
2025-10-09 11:53   ` [PATCH RFC V6 23/24] monitor,qapi: " Markus Armbruster
2025-10-01  1:01 ` [PATCH RFC V6 24/24] tcg: Defer TB flush for 'lazy realized' vCPUs on first region alloc salil.mehta
2025-10-01 21:34   ` Richard Henderson
2025-10-02 12:27     ` Salil Mehta via [this message]
2025-10-02 15:41       ` Richard Henderson
2025-10-07 10:14         ` Salil Mehta via
2025-10-06 14:00 ` [PATCH RFC V6 00/24] Support of Virtual CPU Hotplug-like Feature for ARMv8+ Arch Igor Mammedov
2025-10-13  0:34 ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc780e0c68fa44da975d8f6fcdb38cd7@huawei.com \
    --to=qemu-devel@nongnu.org \
    --cc=mst@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=salil.mehta@huawei.com \
    --cc=salil.mehta@opnsrc.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).