* [RFC PATCH 1/9] accel/tcg: Simplify use of &first_cpu in rr_cpu_thread_fn()
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 19:44 ` Richard Henderson
2025-01-28 14:21 ` [RFC PATCH 2/9] accel/tcg: Invalidate TB jump cache with global vCPU queue locked Philippe Mathieu-Daudé
` (8 subsequent siblings)
9 siblings, 1 reply; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
Let vCPUs wait for themselves being ready first, then other ones.
This allows the first thread to starts without the global vcpu
queue (thus &first_cpu) being populated.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
accel/tcg/tcg-accel-ops-rr.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 028b385af9a..5ad3d617bce 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -197,20 +197,21 @@ static void *rr_cpu_thread_fn(void *arg)
qemu_guest_random_seed_thread_part2(cpu->random_seed);
/* wait for initial kick-off after machine start */
- while (first_cpu->stopped) {
- qemu_cond_wait_bql(first_cpu->halt_cond);
+ while (cpu->stopped) {
+ CPUState *iter_cpu;
+
+ qemu_cond_wait_bql(cpu->halt_cond);
/* process any pending work */
- CPU_FOREACH(cpu) {
- current_cpu = cpu;
- qemu_wait_io_event_common(cpu);
+ CPU_FOREACH(iter_cpu) {
+ current_cpu = iter_cpu;
+ qemu_wait_io_event_common(iter_cpu);
}
}
+ g_assert(first_cpu);
rr_start_kick_timer();
- cpu = first_cpu;
-
/* process any pending work */
cpu->exit_request = 1;
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 1/9] accel/tcg: Simplify use of &first_cpu in rr_cpu_thread_fn()
2025-01-28 14:21 ` [RFC PATCH 1/9] accel/tcg: Simplify use of &first_cpu in rr_cpu_thread_fn() Philippe Mathieu-Daudé
@ 2025-01-28 19:44 ` Richard Henderson
0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 19:44 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> Let vCPUs wait for themselves being ready first, then other ones.
> This allows the first thread to starts without the global vcpu
> queue (thus &first_cpu) being populated.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> accel/tcg/tcg-accel-ops-rr.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
> index 028b385af9a..5ad3d617bce 100644
> --- a/accel/tcg/tcg-accel-ops-rr.c
> +++ b/accel/tcg/tcg-accel-ops-rr.c
> @@ -197,20 +197,21 @@ static void *rr_cpu_thread_fn(void *arg)
> qemu_guest_random_seed_thread_part2(cpu->random_seed);
>
> /* wait for initial kick-off after machine start */
> - while (first_cpu->stopped) {
> - qemu_cond_wait_bql(first_cpu->halt_cond);
> + while (cpu->stopped) {
> + CPUState *iter_cpu;
> +
> + qemu_cond_wait_bql(cpu->halt_cond);
>
> /* process any pending work */
> - CPU_FOREACH(cpu) {
> - current_cpu = cpu;
> - qemu_wait_io_event_common(cpu);
> + CPU_FOREACH(iter_cpu) {
> + current_cpu = iter_cpu;
> + qemu_wait_io_event_common(iter_cpu);
> }
> }
>
> + g_assert(first_cpu);
> rr_start_kick_timer();
>
> - cpu = first_cpu;
This final line seems to be unrelated.
I'm not saying it's wrong, but it's certainly a change in behaviour. We no longer execute
from first_cpu first.
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 2/9] accel/tcg: Invalidate TB jump cache with global vCPU queue locked
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
2025-01-28 14:21 ` [RFC PATCH 1/9] accel/tcg: Simplify use of &first_cpu in rr_cpu_thread_fn() Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 20:05 ` Richard Henderson
2025-01-28 14:21 ` [RFC PATCH 3/9] cpus: Remove cpu from global queue after UNREALIZE completed Philippe Mathieu-Daudé
` (7 subsequent siblings)
9 siblings, 1 reply; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
Invalidate TB with global vCPU queue locked.
See commit 4731f89b3b9 ("cpu: free cpu->tb_jmp_cache with RCU"):
Fixes the appended use-after-free. The root cause is that
during tb invalidation we use CPU_FOREACH, and therefore
to safely free a vCPU we must wait for an RCU grace period
to elapse.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
accel/tcg/tb-maint.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 3f1bebf6ab5..64471af439d 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -891,6 +891,8 @@ static void tb_jmp_cache_inval_tb(TranslationBlock *tb)
} else {
uint32_t h = tb_jmp_cache_hash_func(tb->pc);
+ QEMU_LOCK_GUARD(&qemu_cpu_list_lock);
+
CPU_FOREACH(cpu) {
CPUJumpCache *jc = cpu->tb_jmp_cache;
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 2/9] accel/tcg: Invalidate TB jump cache with global vCPU queue locked
2025-01-28 14:21 ` [RFC PATCH 2/9] accel/tcg: Invalidate TB jump cache with global vCPU queue locked Philippe Mathieu-Daudé
@ 2025-01-28 20:05 ` Richard Henderson
0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:05 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> Invalidate TB with global vCPU queue locked.
>
> See commit 4731f89b3b9 ("cpu: free cpu->tb_jmp_cache with RCU"):
>
> Fixes the appended use-after-free. The root cause is that
> during tb invalidation we use CPU_FOREACH, and therefore
> to safely free a vCPU we must wait for an RCU grace period
> to elapse.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> accel/tcg/tb-maint.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
> index 3f1bebf6ab5..64471af439d 100644
> --- a/accel/tcg/tb-maint.c
> +++ b/accel/tcg/tb-maint.c
> @@ -891,6 +891,8 @@ static void tb_jmp_cache_inval_tb(TranslationBlock *tb)
> } else {
> uint32_t h = tb_jmp_cache_hash_func(tb->pc);
>
> + QEMU_LOCK_GUARD(&qemu_cpu_list_lock);
> +
> CPU_FOREACH(cpu) {
> CPUJumpCache *jc = cpu->tb_jmp_cache;
>
I can see how maybe this can appear to fix the bug, because one can't remove cpus at all
while the lock is held. But if the description is accurate that this is RCU related, then
the proper locking is with rcu_read_lock/rcu_read_unlock.
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 3/9] cpus: Remove cpu from global queue after UNREALIZE completed
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
2025-01-28 14:21 ` [RFC PATCH 1/9] accel/tcg: Simplify use of &first_cpu in rr_cpu_thread_fn() Philippe Mathieu-Daudé
2025-01-28 14:21 ` [RFC PATCH 2/9] accel/tcg: Invalidate TB jump cache with global vCPU queue locked Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 20:08 ` Richard Henderson
2025-01-28 14:21 ` [RFC PATCH 4/9] hw/qdev: Introduce DeviceClass::[un]wire() handlers Philippe Mathieu-Daudé
` (6 subsequent siblings)
9 siblings, 1 reply; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
Previous commit removed the restriction on completing the full QDev
UNREALIZE step before removing vCPUs from global queue, it is now
safe to call cpu_list_remove() after accel_cpu_common_unrealize().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
cpu-target.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/cpu-target.c b/cpu-target.c
index 667688332c9..11592e2583f 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -172,12 +172,9 @@ void cpu_exec_unrealizefn(CPUState *cpu)
}
#endif
- cpu_list_remove(cpu);
- /*
- * Now that the vCPU has been removed from the RCU list, we can call
- * accel_cpu_common_unrealize, which may free fields using call_rcu.
- */
accel_cpu_common_unrealize(cpu);
+
+ cpu_list_remove(cpu);
}
/*
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 3/9] cpus: Remove cpu from global queue after UNREALIZE completed
2025-01-28 14:21 ` [RFC PATCH 3/9] cpus: Remove cpu from global queue after UNREALIZE completed Philippe Mathieu-Daudé
@ 2025-01-28 20:08 ` Richard Henderson
0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:08 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> Previous commit removed the restriction on completing the full QDev
> UNREALIZE step before removing vCPUs from global queue, it is now
> safe to call cpu_list_remove() after accel_cpu_common_unrealize().
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> cpu-target.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/cpu-target.c b/cpu-target.c
> index 667688332c9..11592e2583f 100644
> --- a/cpu-target.c
> +++ b/cpu-target.c
> @@ -172,12 +172,9 @@ void cpu_exec_unrealizefn(CPUState *cpu)
> }
> #endif
>
> - cpu_list_remove(cpu);
> - /*
> - * Now that the vCPU has been removed from the RCU list, we can call
> - * accel_cpu_common_unrealize, which may free fields using call_rcu.
> - */
> accel_cpu_common_unrealize(cpu);
> +
> + cpu_list_remove(cpu);
> }
I don't believe this is correct. Why would we have an unrealized cpu on the list? What's
wrong with removing the cpu from the list before unrealize?
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 4/9] hw/qdev: Introduce DeviceClass::[un]wire() handlers
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
` (2 preceding siblings ...)
2025-01-28 14:21 ` [RFC PATCH 3/9] cpus: Remove cpu from global queue after UNREALIZE completed Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 20:52 ` Richard Henderson
2025-01-28 14:21 ` [PATCH 5/9] cpus: Add DeviceClass::[un]wire() stubs Philippe Mathieu-Daudé
` (5 subsequent siblings)
9 siblings, 1 reply; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
We are trying to understand what means "a qdev is realized".
One explanation was "the device is guest visible"; however
many devices are realized before being mapped, thus are not
"guest visible". Some devices map / wire their IRQs before
being realized (such ISA devices). There is a need for devices
to be "automatically" mapped/wired (see [2]) such CLI-created
devices, but this apply generically to dynamic machines.
Currently the device creation steps are expected to roughly be:
(external use) (QDev core) (Device Impl)
~~~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~
INIT enter
----->
+----------------------+
| Allocate state |
+----------------------+
----->
+---------------------+
| INIT children |
| |
| Alias children properties
| |
| Expose properties |
INIT exit +---------------------+
<-----------------------------------
+----------------+
| set properties |
| |
| set ClkIn |
+----------------+ REALIZE enter
---------------------------------->
+----------------------+
| Use config properties|
| |
| Realize children |
| |
| Init GPIOs/IRQs |
| |
| Init MemoryRegions |
+----------------------+
REALIZE exit
<----------------------------------- ---- "realized" / "guest visible"
+-----------------+
| Explicit wiring:|
| IRQs |
| I/O / Mem |
| ClkOut |
+-----------------+ RESET enter
--------------------------------->
+----------------------+
| Reset default values |
+----------------------+
But as mentioned, various devices "wire" parts before they exit
the "realize" step.
In order to clarify, I'm trying to enforce what can be done
*before* and *after* realization.
*after* a device is expected to be stable (no more configurable)
and fully usable.
To be able to use internal/auto wiring (such ISA devices) and
keep the current external/explicit wiring, I propose to add an
extra "internal wiring" step, happening after the REALIZE step
as:
(external use) (QDev core) (Device Impl)
~~~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~
INIT enter
----->
+----------------------+
| Allocate state |
+----------------------+
----->
+---------------------+
| INIT children |
| |
| Alias children properties
| |
| Expose properties |
INIT exit +---------------------+
<-----------------------------------
+----------------+
| set properties |
| |
| set ClkIn |
+----------------+ REALIZE enter
---------------------------------->
+----------------------+
| Use config properties|
| |
| Realize children |
| |
| Init GPIOs/IRQs |
| |
| Init MemoryRegions |
+----------------------+
REALIZE exit <---
+----------------------+
| Internal auto wiring |
| IRQs | (i.e. ISA bus)
| I/O / Mem |
| ClkOut |
+----------------------+
<--- ---- "realized"
+-----------------+
| External wiring:|
| IRQs |
| I/O / Mem |
| ClkOut |
+-----------------+ RESET enter ---- "guest visible"
--------------------------------->
+----------------------+
| Reset default values |
+----------------------+
The "realized" point is not changed. "guest visible" concept only
occurs *after* wiring, just before the reset phase.
This change introduces the DeviceClass::wire handler within qdev
core realization code.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
include/hw/qdev-core.h | 7 +++++++
hw/core/qdev.c | 20 +++++++++++++++++++-
2 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 530f3da7021..021bb7afdc0 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -102,7 +102,12 @@ typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
* @props: Properties accessing state fields.
* @realize: Callback function invoked when the #DeviceState:realized
* property is changed to %true.
+ * @wire: Callback function called after @realize to connect IRQs,
+ * clocks and map memories. Can not fail.
+ * @unwire: Callback function to undo @wire. Called before @unrealize.
+ * Can not fail.
* @unrealize: Callback function invoked when the #DeviceState:realized
+ * property is changed to %false. Can not fail.
* property is changed to %false.
* @sync_config: Callback function invoked when QMP command device-sync-config
* is called. Should synchronize device configuration from host to guest part
@@ -171,6 +176,8 @@ struct DeviceClass {
*/
DeviceReset legacy_reset;
DeviceRealize realize;
+ void (*wire)(DeviceState *dev);
+ void (*unwire)(DeviceState *dev);
DeviceUnrealize unrealize;
DeviceSyncConfig sync_config;
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 82bbdcb654e..38449255365 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -554,6 +554,15 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
}
}
+ if (dc->wire) {
+ if (!dc->unwire) {
+ warn_report_once("wire() without unwire() for type '%s'",
+ object_get_typename(OBJECT(dev)));
+ }
+ dc->wire(dev);
+ }
+
+ /* At this point the device is "guest visible". */
qatomic_store_release(&dev->realized, value);
} else if (!value && dev->realized) {
@@ -573,6 +582,15 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
*/
smp_wmb();
+ if (dc->unwire) {
+ if (!dc->wire) {
+ error_report("disconnect() without connect() for type '%s'",
+ object_get_typename(OBJECT(dev)));
+ abort();
+ }
+ dc->unwire(dev);
+ }
+
QLIST_FOREACH(bus, &dev->child_bus, sibling) {
qbus_unrealize(bus);
}
@@ -585,8 +603,8 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
dev->pending_deleted_event = true;
DEVICE_LISTENER_CALL(unrealize, Reverse, dev);
}
-
assert(local_err == NULL);
+
return;
child_realize_fail:
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 4/9] hw/qdev: Introduce DeviceClass::[un]wire() handlers
2025-01-28 14:21 ` [RFC PATCH 4/9] hw/qdev: Introduce DeviceClass::[un]wire() handlers Philippe Mathieu-Daudé
@ 2025-01-28 20:52 ` Richard Henderson
2025-02-04 13:39 ` Igor Mammedov
0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:52 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> We are trying to understand what means "a qdev is realized".
> One explanation was "the device is guest visible"; however
> many devices are realized before being mapped, thus are not
> "guest visible". Some devices map / wire their IRQs before
> being realized (such ISA devices). There is a need for devices
> to be "automatically" mapped/wired (see [2]) such CLI-created
> devices, but this apply generically to dynamic machines.
>
> Currently the device creation steps are expected to roughly be:
>
> (external use) (QDev core) (Device Impl)
> ~~~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~
>
> INIT enter
> ----->
> +----------------------+
> | Allocate state |
> +----------------------+
> ----->
> +---------------------+
> | INIT children |
> | |
> | Alias children properties
> | |
> | Expose properties |
> INIT exit +---------------------+
> <-----------------------------------
> +----------------+
> | set properties |
> | |
> | set ClkIn |
> +----------------+ REALIZE enter
> ---------------------------------->
> +----------------------+
> | Use config properties|
> | |
> | Realize children |
> | |
> | Init GPIOs/IRQs |
> | |
> | Init MemoryRegions |
> +----------------------+
> REALIZE exit
> <----------------------------------- ---- "realized" / "guest visible"
> +-----------------+
> | Explicit wiring:|
> | IRQs |
> | I/O / Mem |
> | ClkOut |
> +-----------------+ RESET enter
> --------------------------------->
> +----------------------+
> | Reset default values |
> +----------------------+
>
> But as mentioned, various devices "wire" parts before they exit
> the "realize" step.
> In order to clarify, I'm trying to enforce what can be done
> *before* and *after* realization.
>
> *after* a device is expected to be stable (no more configurable)
> and fully usable.
>
> To be able to use internal/auto wiring (such ISA devices) and
> keep the current external/explicit wiring, I propose to add an
> extra "internal wiring" step, happening after the REALIZE step
> as:
>
> (external use) (QDev core) (Device Impl)
> ~~~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~
>
> INIT enter
> ----->
> +----------------------+
> | Allocate state |
> +----------------------+
> ----->
> +---------------------+
> | INIT children |
> | |
> | Alias children properties
> | |
> | Expose properties |
> INIT exit +---------------------+
> <-----------------------------------
> +----------------+
> | set properties |
> | |
> | set ClkIn |
> +----------------+ REALIZE enter
> ---------------------------------->
> +----------------------+
> | Use config properties|
> | |
> | Realize children |
> | |
> | Init GPIOs/IRQs |
> | |
> | Init MemoryRegions |
> +----------------------+
> REALIZE exit <---
> +----------------------+
> | Internal auto wiring |
> | IRQs | (i.e. ISA bus)
> | I/O / Mem |
> | ClkOut |
> +----------------------+
> <--- ---- "realized"
> +-----------------+
> | External wiring:|
> | IRQs |
> | I/O / Mem |
> | ClkOut |
> +-----------------+ RESET enter ---- "guest visible"
> --------------------------------->
> +----------------------+
> | Reset default values |
> +----------------------+
>
> The "realized" point is not changed. "guest visible" concept only
> occurs *after* wiring, just before the reset phase.
>
> This change introduces the DeviceClass::wire handler within qdev
> core realization code.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> include/hw/qdev-core.h | 7 +++++++
> hw/core/qdev.c | 20 +++++++++++++++++++-
> 2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 530f3da7021..021bb7afdc0 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -102,7 +102,12 @@ typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
> * @props: Properties accessing state fields.
> * @realize: Callback function invoked when the #DeviceState:realized
> * property is changed to %true.
> + * @wire: Callback function called after @realize to connect IRQs,
> + * clocks and map memories. Can not fail.
> + * @unwire: Callback function to undo @wire. Called before @unrealize.
> + * Can not fail.
> * @unrealize: Callback function invoked when the #DeviceState:realized
> + * property is changed to %false. Can not fail.
> * property is changed to %false.
> * @sync_config: Callback function invoked when QMP command device-sync-config
> * is called. Should synchronize device configuration from host to guest part
> @@ -171,6 +176,8 @@ struct DeviceClass {
> */
> DeviceReset legacy_reset;
> DeviceRealize realize;
> + void (*wire)(DeviceState *dev);
> + void (*unwire)(DeviceState *dev);
> DeviceUnrealize unrealize;
> DeviceSyncConfig sync_config;
>
> diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> index 82bbdcb654e..38449255365 100644
> --- a/hw/core/qdev.c
> +++ b/hw/core/qdev.c
> @@ -554,6 +554,15 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
> }
> }
>
> + if (dc->wire) {
> + if (!dc->unwire) {
> + warn_report_once("wire() without unwire() for type '%s'",
> + object_get_typename(OBJECT(dev)));
> + }
> + dc->wire(dev);
> + }
> +
> + /* At this point the device is "guest visible". */
> qatomic_store_release(&dev->realized, value);
>
> } else if (!value && dev->realized) {
> @@ -573,6 +582,15 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
> */
> smp_wmb();
>
> + if (dc->unwire) {
> + if (!dc->wire) {
> + error_report("disconnect() without connect() for type '%s'",
> + object_get_typename(OBJECT(dev)));
> + abort();
> + }
> + dc->unwire(dev);
> + }
Mismatched error messages (wire vs connect).
But, really, just check both directions properly at startup.
There's probably lots of places where devices are never unrealized.
Otherwise, this seems sane, having a kind of post_init on the realize path, running after
all superclass realization is done.
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 4/9] hw/qdev: Introduce DeviceClass::[un]wire() handlers
2025-01-28 20:52 ` Richard Henderson
@ 2025-02-04 13:39 ` Igor Mammedov
0 siblings, 0 replies; 23+ messages in thread
From: Igor Mammedov @ 2025-02-04 13:39 UTC (permalink / raw)
To: Richard Henderson
Cc: Philippe Mathieu-Daudé, qemu-devel, Peter Maydell,
Yanan Wang, Eduardo Habkost, Harsh Prateek Bora, kvm, Zhao Liu,
Daniel P. Berrangé, Markus Armbruster, Marcel Apfelbaum,
Peter Xu, Paolo Bonzini
On Tue, 28 Jan 2025 12:52:39 -0800
Richard Henderson <richard.henderson@linaro.org> wrote:
> On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> > We are trying to understand what means "a qdev is realized".
> > One explanation was "the device is guest visible"; however
> > many devices are realized before being mapped, thus are not
> > "guest visible". Some devices map / wire their IRQs before
> > being realized (such ISA devices). There is a need for devices
> > to be "automatically" mapped/wired (see [2]) such CLI-created
> > devices, but this apply generically to dynamic machines.
> >
> > Currently the device creation steps are expected to roughly be:
> >
> > (external use) (QDev core) (Device Impl)
> > ~~~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~
> >
> > INIT enter
> > ----->
> > +----------------------+
> > | Allocate state |
> > +----------------------+
> > ----->
> > +---------------------+
> > | INIT children |
> > | |
> > | Alias children properties
> > | |
> > | Expose properties |
> > INIT exit +---------------------+
> > <-----------------------------------
> > +----------------+
> > | set properties |
> > | |
> > | set ClkIn |
> > +----------------+ REALIZE enter
> > ---------------------------------->
> > +----------------------+
> > | Use config properties|
> > | |
> > | Realize children |
> > | |
> > | Init GPIOs/IRQs |
> > | |
> > | Init MemoryRegions |
> > +----------------------+
> > REALIZE exit
> > <----------------------------------- ---- "realized" / "guest visible"
> > +-----------------+
> > | Explicit wiring:|
> > | IRQs |
> > | I/O / Mem |
> > | ClkOut |
> > +-----------------+ RESET enter
> > --------------------------------->
> > +----------------------+
> > | Reset default values |
> > +----------------------+
> >
> > But as mentioned, various devices "wire" parts before they exit
> > the "realize" step.
> > In order to clarify, I'm trying to enforce what can be done
> > *before* and *after* realization.
> >
> > *after* a device is expected to be stable (no more configurable)
> > and fully usable.
> >
> > To be able to use internal/auto wiring (such ISA devices) and
> > keep the current external/explicit wiring, I propose to add an
> > extra "internal wiring" step, happening after the REALIZE step
we partially have such wiring in place
hotplug_handler_plug
'hotplug_' prefix is legacy remnant to confuse every, it really
should be renamed to just plug_handler or 'wire_foo'.
For bus-less CPUs it's used both on hot- and cold-plug paths.
It does take care of wiring after realize part is completed
(basically exposing device to the external users).
And I it also handles 'bus' based plug workflow
see qdev_get_hotplug_handler() assuming bus owner has provided a callback.
It's likely no-one had cared about ISA, as it was out of
hotplug scope when the interface was introduced.
Unplug part is however is not wired directly into unrealize() stage,
but rather to an external caller (mgmt or guest), which
essentially unwinds into the same hotplug_handler chain as plug,
to unwire device and then the same external caller does call unrealize
on device. (see:hotplug_handler_unplug() )
As for more direct hack, there is also DEVICE_LISTENER_CALL() callback,
which is more close to this approach (modulo users are hidden as
there is not explicit wiring).
Unrealize part is called too late (which probably is not right),
and realize part is called before subtree is realized (might work for
current users, but likely should be placed at the same site as wire() call.
Hence we should think if already existing methods could be reused/adapted to,
before adding similar wire/unwire mechanism.
> > as:
> >
> > (external use) (QDev core) (Device Impl)
> > ~~~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~~~
> >
> > INIT enter
> > ----->
> > +----------------------+
> > | Allocate state |
> > +----------------------+
> > ----->
> > +---------------------+
> > | INIT children |
> > | |
> > | Alias children properties
> > | |
> > | Expose properties |
> > INIT exit +---------------------+
> > <-----------------------------------
> > +----------------+
> > | set properties |
> > | |
> > | set ClkIn |
> > +----------------+ REALIZE enter
> > ---------------------------------->
> > +----------------------+
> > | Use config properties|
> > | |
> > | Realize children |
> > | |
> > | Init GPIOs/IRQs |
> > | |
> > | Init MemoryRegions |
> > +----------------------+
> > REALIZE exit <---
> > +----------------------+
> > | Internal auto wiring |
> > | IRQs | (i.e. ISA bus)
> > | I/O / Mem |
> > | ClkOut |
> > +----------------------+
> > <--- ---- "realized"
> > +-----------------+
> > | External wiring:|
> > | IRQs |
> > | I/O / Mem |
> > | ClkOut |
> > +-----------------+ RESET enter ---- "guest visible"
> > --------------------------------->
> > +----------------------+
> > | Reset default values |
> > +----------------------+
> >
> > The "realized" point is not changed. "guest visible" concept only
> > occurs *after* wiring, just before the reset phase.
> >
> > This change introduces the DeviceClass::wire handler within qdev
> > core realization code.
> >
> > Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> > ---
> > include/hw/qdev-core.h | 7 +++++++
> > hw/core/qdev.c | 20 +++++++++++++++++++-
> > 2 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> > index 530f3da7021..021bb7afdc0 100644
> > --- a/include/hw/qdev-core.h
> > +++ b/include/hw/qdev-core.h
> > @@ -102,7 +102,12 @@ typedef int (*DeviceSyncConfig)(DeviceState *dev, Error **errp);
> > * @props: Properties accessing state fields.
> > * @realize: Callback function invoked when the #DeviceState:realized
> > * property is changed to %true.
> > + * @wire: Callback function called after @realize to connect IRQs,
> > + * clocks and map memories. Can not fail.
> > + * @unwire: Callback function to undo @wire. Called before @unrealize.
> > + * Can not fail.
> > * @unrealize: Callback function invoked when the #DeviceState:realized
> > + * property is changed to %false. Can not fail.
> > * property is changed to %false.
> > * @sync_config: Callback function invoked when QMP command device-sync-config
> > * is called. Should synchronize device configuration from host to guest part
> > @@ -171,6 +176,8 @@ struct DeviceClass {
> > */
> > DeviceReset legacy_reset;
> > DeviceRealize realize;
> > + void (*wire)(DeviceState *dev);
> > + void (*unwire)(DeviceState *dev);
> > DeviceUnrealize unrealize;
> > DeviceSyncConfig sync_config;
> >
> > diff --git a/hw/core/qdev.c b/hw/core/qdev.c
> > index 82bbdcb654e..38449255365 100644
> > --- a/hw/core/qdev.c
> > +++ b/hw/core/qdev.c
> > @@ -554,6 +554,15 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
> > }
> > }
> >
> > + if (dc->wire) {
> > + if (!dc->unwire) {
> > + warn_report_once("wire() without unwire() for type '%s'",
> > + object_get_typename(OBJECT(dev)));
> > + }
> > + dc->wire(dev);
> > + }
> > +
> > + /* At this point the device is "guest visible". */
> > qatomic_store_release(&dev->realized, value);
> >
> > } else if (!value && dev->realized) {
> > @@ -573,6 +582,15 @@ static void device_set_realized(Object *obj, bool value, Error **errp)
> > */
> > smp_wmb();
> >
> > + if (dc->unwire) {
> > + if (!dc->wire) {
> > + error_report("disconnect() without connect() for type '%s'",
> > + object_get_typename(OBJECT(dev)));
> > + abort();
> > + }
> > + dc->unwire(dev);
> > + }
>
> Mismatched error messages (wire vs connect).
> But, really, just check both directions properly at startup.
> There's probably lots of places where devices are never unrealized.
>
> Otherwise, this seems sane, having a kind of post_init on the realize path, running after
> all superclass realization is done.
>
>
> r~
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 5/9] cpus: Add DeviceClass::[un]wire() stubs
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
` (3 preceding siblings ...)
2025-01-28 14:21 ` [RFC PATCH 4/9] hw/qdev: Introduce DeviceClass::[un]wire() handlers Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 20:53 ` Richard Henderson
2025-01-28 14:21 ` [PATCH 6/9] cpus: Call hotplug handlers in DeviceWire() Philippe Mathieu-Daudé
` (4 subsequent siblings)
9 siblings, 1 reply; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
hw/core/cpu-common.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index cb79566cc51..9ee44a00277 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -219,6 +219,14 @@ static void cpu_common_realizefn(DeviceState *dev, Error **errp)
/* NOTE: latest generic point where the cpu is fully realized */
}
+static void cpu_common_wire(DeviceState *dev)
+{
+}
+
+static void cpu_common_unwire(DeviceState *dev)
+{
+}
+
static void cpu_common_unrealizefn(DeviceState *dev)
{
CPUState *cpu = CPU(dev);
@@ -311,6 +319,8 @@ static void cpu_common_class_init(ObjectClass *klass, void *data)
k->gdb_write_register = cpu_common_gdb_write_register;
set_bit(DEVICE_CATEGORY_CPU, dc->categories);
dc->realize = cpu_common_realizefn;
+ dc->wire = cpu_common_wire;
+ dc->unwire = cpu_common_unwire;
dc->unrealize = cpu_common_unrealizefn;
rc->phases.hold = cpu_common_reset_hold;
cpu_class_init_props(dc);
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 5/9] cpus: Add DeviceClass::[un]wire() stubs
2025-01-28 14:21 ` [PATCH 5/9] cpus: Add DeviceClass::[un]wire() stubs Philippe Mathieu-Daudé
@ 2025-01-28 20:53 ` Richard Henderson
0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:53 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> hw/core/cpu-common.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
> index cb79566cc51..9ee44a00277 100644
> --- a/hw/core/cpu-common.c
> +++ b/hw/core/cpu-common.c
> @@ -219,6 +219,14 @@ static void cpu_common_realizefn(DeviceState *dev, Error **errp)
> /* NOTE: latest generic point where the cpu is fully realized */
> }
>
> +static void cpu_common_wire(DeviceState *dev)
> +{
> +}
> +
> +static void cpu_common_unwire(DeviceState *dev)
> +{
> +}
> +
> static void cpu_common_unrealizefn(DeviceState *dev)
> {
> CPUState *cpu = CPU(dev);
> @@ -311,6 +319,8 @@ static void cpu_common_class_init(ObjectClass *klass, void *data)
> k->gdb_write_register = cpu_common_gdb_write_register;
> set_bit(DEVICE_CATEGORY_CPU, dc->categories);
> dc->realize = cpu_common_realizefn;
> + dc->wire = cpu_common_wire;
> + dc->unwire = cpu_common_unwire;
> dc->unrealize = cpu_common_unrealizefn;
> rc->phases.hold = cpu_common_reset_hold;
> cpu_class_init_props(dc);
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
But doesn't need to be split from patch 6.
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 6/9] cpus: Call hotplug handlers in DeviceWire()
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
` (4 preceding siblings ...)
2025-01-28 14:21 ` [PATCH 5/9] cpus: Add DeviceClass::[un]wire() stubs Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 20:53 ` Richard Henderson
2025-01-28 14:21 ` [RFC PATCH 7/9] cpus: Only expose REALIZED vCPUs to global &cpus_queue Philippe Mathieu-Daudé
` (3 subsequent siblings)
9 siblings, 1 reply; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
To call the hotplug handlers with REALIZED vCPU, we can
use the DeviceWire handler.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
hw/core/cpu-common.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index 9ee44a00277..8a02ac146f6 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -211,16 +211,17 @@ static void cpu_common_realizefn(DeviceState *dev, Error **errp)
}
}
- if (dev->hotplugged) {
- cpu_synchronize_post_init(cpu);
- cpu_resume(cpu);
- }
-
/* NOTE: latest generic point where the cpu is fully realized */
}
static void cpu_common_wire(DeviceState *dev)
{
+ CPUState *cpu = CPU(dev);
+
+ if (dev->hotplugged) {
+ cpu_synchronize_post_init(cpu);
+ cpu_resume(cpu);
+ }
}
static void cpu_common_unwire(DeviceState *dev)
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 6/9] cpus: Call hotplug handlers in DeviceWire()
2025-01-28 14:21 ` [PATCH 6/9] cpus: Call hotplug handlers in DeviceWire() Philippe Mathieu-Daudé
@ 2025-01-28 20:53 ` Richard Henderson
0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:53 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> To call the hotplug handlers with REALIZED vCPU, we can
> use the DeviceWire handler.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> hw/core/cpu-common.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
> index 9ee44a00277..8a02ac146f6 100644
> --- a/hw/core/cpu-common.c
> +++ b/hw/core/cpu-common.c
> @@ -211,16 +211,17 @@ static void cpu_common_realizefn(DeviceState *dev, Error **errp)
> }
> }
>
> - if (dev->hotplugged) {
> - cpu_synchronize_post_init(cpu);
> - cpu_resume(cpu);
> - }
> -
> /* NOTE: latest generic point where the cpu is fully realized */
> }
>
> static void cpu_common_wire(DeviceState *dev)
> {
> + CPUState *cpu = CPU(dev);
> +
> + if (dev->hotplugged) {
> + cpu_synchronize_post_init(cpu);
> + cpu_resume(cpu);
> + }
> }
>
> static void cpu_common_unwire(DeviceState *dev)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC PATCH 7/9] cpus: Only expose REALIZED vCPUs to global &cpus_queue
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
` (5 preceding siblings ...)
2025-01-28 14:21 ` [PATCH 6/9] cpus: Call hotplug handlers in DeviceWire() Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 20:53 ` Richard Henderson
2025-01-28 20:55 ` Richard Henderson
2025-01-28 14:21 ` [PATCH 8/9] accel/kvm: Assert vCPU is created when calling kvm_dirty_ring_reap*() Philippe Mathieu-Daudé
` (2 subsequent siblings)
9 siblings, 2 replies; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
cpu_list_add() was doing 2 distinct things:
- assign some index to vCPU
- add unrealized (thus in inconsistent state) vcpu to &cpus_queue
Code using CPU_FOREACH() macro would iterate over possibly
unrealized vCPUs, often dealt with special casing.
In order to avoid that, we move the addition of vCPU to global queue
to the DeviceWire handler, which is called just before switching the
vCPU to REALIZED state. This ensure all &cpus_queue users (like via
&first_cpu or CPU_FOREACH) get a realized vCPU in consistent state.
Similarly we remove it from the global queue at DeviceUnwire phase,
just after marking the vCPU UNREALIZED.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
cpu-common.c | 2 --
hw/core/cpu-common.c | 5 +++++
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/cpu-common.c b/cpu-common.c
index 4248b2d727e..72ee8dc414e 100644
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -91,7 +91,6 @@ void cpu_list_add(CPUState *cpu)
} else {
assert(!cpu_index_auto_assigned);
}
- QTAILQ_INSERT_TAIL_RCU(&cpus_queue, cpu, node);
cpu_list_generation_id++;
}
@@ -103,7 +102,6 @@ void cpu_list_remove(CPUState *cpu)
return;
}
- QTAILQ_REMOVE_RCU(&cpus_queue, cpu, node);
cpu->cpu_index = UNASSIGNED_CPU_INDEX;
cpu_list_generation_id++;
}
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index 8a02ac146f6..df7a6913603 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -218,6 +218,8 @@ static void cpu_common_wire(DeviceState *dev)
{
CPUState *cpu = CPU(dev);
+ QTAILQ_INSERT_TAIL_RCU(&cpus_queue, cpu, node);
+
if (dev->hotplugged) {
cpu_synchronize_post_init(cpu);
cpu_resume(cpu);
@@ -226,6 +228,9 @@ static void cpu_common_wire(DeviceState *dev)
static void cpu_common_unwire(DeviceState *dev)
{
+ CPUState *cpu = CPU(dev);
+
+ QTAILQ_REMOVE_RCU(&cpus_queue, cpu, node);
}
static void cpu_common_unrealizefn(DeviceState *dev)
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 7/9] cpus: Only expose REALIZED vCPUs to global &cpus_queue
2025-01-28 14:21 ` [RFC PATCH 7/9] cpus: Only expose REALIZED vCPUs to global &cpus_queue Philippe Mathieu-Daudé
@ 2025-01-28 20:53 ` Richard Henderson
2025-01-28 20:55 ` Richard Henderson
1 sibling, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:53 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> cpu_list_add() was doing 2 distinct things:
> - assign some index to vCPU
> - add unrealized (thus in inconsistent state) vcpu to &cpus_queue
>
> Code using CPU_FOREACH() macro would iterate over possibly
> unrealized vCPUs, often dealt with special casing.
>
> In order to avoid that, we move the addition of vCPU to global queue
> to the DeviceWire handler, which is called just before switching the
> vCPU to REALIZED state. This ensure all &cpus_queue users (like via
> &first_cpu or CPU_FOREACH) get a realized vCPU in consistent state.
>
> Similarly we remove it from the global queue at DeviceUnwire phase,
> just after marking the vCPU UNREALIZED.
>
> Signed-off-by: Philippe Mathieu-Daudé<philmd@linaro.org>
> ---
> cpu-common.c | 2 --
> hw/core/cpu-common.c | 5 +++++
> 2 files changed, 5 insertions(+), 2 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 7/9] cpus: Only expose REALIZED vCPUs to global &cpus_queue
2025-01-28 14:21 ` [RFC PATCH 7/9] cpus: Only expose REALIZED vCPUs to global &cpus_queue Philippe Mathieu-Daudé
2025-01-28 20:53 ` Richard Henderson
@ 2025-01-28 20:55 ` Richard Henderson
1 sibling, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:55 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> @@ -91,7 +91,6 @@ void cpu_list_add(CPUState *cpu)
> } else {
> assert(!cpu_index_auto_assigned);
> }
> - QTAILQ_INSERT_TAIL_RCU(&cpus_queue, cpu, node);
> cpu_list_generation_id++;
> }
>
> @@ -103,7 +102,6 @@ void cpu_list_remove(CPUState *cpu)
> return;
> }
>
> - QTAILQ_REMOVE_RCU(&cpus_queue, cpu, node);
> cpu->cpu_index = UNASSIGNED_CPU_INDEX;
> cpu_list_generation_id++;
> }
We might rename cpu_list_add/remove, since they no longer do what's said on the tin.
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 8/9] accel/kvm: Assert vCPU is created when calling kvm_dirty_ring_reap*()
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
` (6 preceding siblings ...)
2025-01-28 14:21 ` [RFC PATCH 7/9] cpus: Only expose REALIZED vCPUs to global &cpus_queue Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 20:57 ` Richard Henderson
2025-01-28 14:21 ` [PATCH 9/9] accel/kvm: Remove unreachable assertion in kvm_dirty_ring_reap*() Philippe Mathieu-Daudé
2025-02-07 15:45 ` [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Igor Mammedov
9 siblings, 1 reply; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
Previous commits made sure vCPUs are realized before accelerators
(such KVM) use them. Ensure that by asserting the vCPU is created,
no need to return.
For more context, see commit 56adee407fc ("kvm: dirty-ring: Fix race
with vcpu creation").
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
accel/kvm/kvm-all.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c65b790433c..cb56d120a91 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -831,13 +831,11 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
uint32_t count = 0, fetch = cpu->kvm_fetch_index;
/*
- * It's possible that we race with vcpu creation code where the vcpu is
+ * It's not possible that we race with vcpu creation code where the vcpu is
* put onto the vcpus list but not yet initialized the dirty ring
- * structures. If so, skip it.
+ * structures.
*/
- if (!cpu->created) {
- return 0;
- }
+ assert(cpu->created);
assert(dirty_gfns && ring_size);
trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index);
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 8/9] accel/kvm: Assert vCPU is created when calling kvm_dirty_ring_reap*()
2025-01-28 14:21 ` [PATCH 8/9] accel/kvm: Assert vCPU is created when calling kvm_dirty_ring_reap*() Philippe Mathieu-Daudé
@ 2025-01-28 20:57 ` Richard Henderson
0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:57 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> Previous commits made sure vCPUs are realized before accelerators
> (such KVM) use them. Ensure that by asserting the vCPU is created,
> no need to return.
>
> For more context, see commit 56adee407fc ("kvm: dirty-ring: Fix race
> with vcpu creation").
>
> Signed-off-by: Philippe Mathieu-Daudé<philmd@linaro.org>
> ---
> accel/kvm/kvm-all.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 9/9] accel/kvm: Remove unreachable assertion in kvm_dirty_ring_reap*()
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
` (7 preceding siblings ...)
2025-01-28 14:21 ` [PATCH 8/9] accel/kvm: Assert vCPU is created when calling kvm_dirty_ring_reap*() Philippe Mathieu-Daudé
@ 2025-01-28 14:21 ` Philippe Mathieu-Daudé
2025-01-28 20:58 ` Richard Henderson
2025-02-07 15:45 ` [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Igor Mammedov
9 siblings, 1 reply; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-01-28 14:21 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu,
Philippe Mathieu-Daudé, Paolo Bonzini, Richard Henderson
Previous commit passed all our CI tests, this assertion being
never triggered. Remove it as dead code.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
accel/kvm/kvm-all.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index cb56d120a91..814b1a53eb8 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -830,13 +830,6 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
uint32_t ring_size = s->kvm_dirty_ring_size;
uint32_t count = 0, fetch = cpu->kvm_fetch_index;
- /*
- * It's not possible that we race with vcpu creation code where the vcpu is
- * put onto the vcpus list but not yet initialized the dirty ring
- * structures.
- */
- assert(cpu->created);
-
assert(dirty_gfns && ring_size);
trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index);
--
2.47.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 9/9] accel/kvm: Remove unreachable assertion in kvm_dirty_ring_reap*()
2025-01-28 14:21 ` [PATCH 9/9] accel/kvm: Remove unreachable assertion in kvm_dirty_ring_reap*() Philippe Mathieu-Daudé
@ 2025-01-28 20:58 ` Richard Henderson
0 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2025-01-28 20:58 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, qemu-devel
Cc: Peter Maydell, Yanan Wang, Eduardo Habkost, Harsh Prateek Bora,
kvm, Zhao Liu, Daniel P. Berrangé, Markus Armbruster,
Igor Mammedov, Marcel Apfelbaum, Peter Xu, Paolo Bonzini
On 1/28/25 06:21, Philippe Mathieu-Daudé wrote:
> Previous commit passed all our CI tests, this assertion being
> never triggered. Remove it as dead code.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> accel/kvm/kvm-all.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index cb56d120a91..814b1a53eb8 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -830,13 +830,6 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
> uint32_t ring_size = s->kvm_dirty_ring_size;
> uint32_t count = 0, fetch = cpu->kvm_fetch_index;
>
> - /*
> - * It's not possible that we race with vcpu creation code where the vcpu is
> - * put onto the vcpus list but not yet initialized the dirty ring
> - * structures.
> - */
> - assert(cpu->created);
> -
> assert(dirty_gfns && ring_size);
> trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index);
>
I'd be ok squashing this, but also ok with retaining the patch separate for the CI comment.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue
2025-01-28 14:21 [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Philippe Mathieu-Daudé
` (8 preceding siblings ...)
2025-01-28 14:21 ` [PATCH 9/9] accel/kvm: Remove unreachable assertion in kvm_dirty_ring_reap*() Philippe Mathieu-Daudé
@ 2025-02-07 15:45 ` Igor Mammedov
2025-02-09 18:02 ` Philippe Mathieu-Daudé
9 siblings, 1 reply; 23+ messages in thread
From: Igor Mammedov @ 2025-02-07 15:45 UTC (permalink / raw)
To: Philippe Mathieu-Daudé
Cc: qemu-devel, Peter Maydell, Yanan Wang, Eduardo Habkost,
Harsh Prateek Bora, kvm, Zhao Liu, Daniel P. Berrangé,
Markus Armbruster, Marcel Apfelbaum, Peter Xu, Paolo Bonzini,
Richard Henderson
On Tue, 28 Jan 2025 15:21:43 +0100
Philippe Mathieu-Daudé <philmd@linaro.org> wrote:
> Hi,
>
> The goal of this series is to expose vCPUs in a stable state
> to the accelerators, in particular the QDev 'REALIZED' step.
I'll take some of your patches (with Richard's feedback fixed),
and respin series focusing mostly on realize part.
(and drop wire/unwire parts, reshuffling cpu code instead)
>
> To do so we split the QTAILQ_INSERT/REMOVE calls from
> cpu_list_add() / cpu_list_remove(), by moving them to the
> DeviceClass::[un]wire() handlers, guaranty to be called just
> before a vCPU is exposed to the guest, as "realized".
>
> First we have to modify how &first_cpu is used in TCG round
> robin implementation, and ensure we invalidate the TB jmpcache
> with &qemu_cpu_list locked.
>
> I'm really out of my comfort zone here, so posting as RFC. At
> least all test suite is passing...
>
> I expect these changes to allow CPUState::cpu_index clarifications
> and simplifications, but this will be addressed (and commented) in
> a separate series.
>
> Regards,
>
> Phil.
>
> Philippe Mathieu-Daudé (9):
> accel/tcg: Simplify use of &first_cpu in rr_cpu_thread_fn()
> accel/tcg: Invalidate TB jump cache with global vCPU queue locked
> cpus: Remove cpu from global queue after UNREALIZE completed
> hw/qdev: Introduce DeviceClass::[un]wire() handlers
> cpus: Add DeviceClass::[un]wire() stubs
> cpus: Call hotplug handlers in DeviceWire()
> cpus: Only expose REALIZED vCPUs to global &cpus_queue
> accel/kvm: Assert vCPU is created when calling kvm_dirty_ring_reap*()
> accel/kvm: Remove unreachable assertion in kvm_dirty_ring_reap*()
>
> include/hw/qdev-core.h | 7 +++++++
> accel/kvm/kvm-all.c | 9 ---------
> accel/tcg/tb-maint.c | 2 ++
> accel/tcg/tcg-accel-ops-rr.c | 15 ++++++++-------
> cpu-common.c | 2 --
> cpu-target.c | 7 ++-----
> hw/core/cpu-common.c | 18 +++++++++++++++++-
> hw/core/qdev.c | 20 +++++++++++++++++++-
> 8 files changed, 55 insertions(+), 25 deletions(-)
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue
2025-02-07 15:45 ` [RFC PATCH 0/9] accel: Only include qdev-realized vCPUs in global &cpus_queue Igor Mammedov
@ 2025-02-09 18:02 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 23+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-02-09 18:02 UTC (permalink / raw)
To: Igor Mammedov
Cc: qemu-devel, Peter Maydell, Yanan Wang, Eduardo Habkost,
Harsh Prateek Bora, kvm, Zhao Liu, Daniel P. Berrangé,
Markus Armbruster, Marcel Apfelbaum, Peter Xu, Paolo Bonzini,
Richard Henderson
On 7/2/25 16:45, Igor Mammedov wrote:
> On Tue, 28 Jan 2025 15:21:43 +0100
> Philippe Mathieu-Daudé <philmd@linaro.org> wrote:
>
>> Hi,
>>
>> The goal of this series is to expose vCPUs in a stable state
>> to the accelerators, in particular the QDev 'REALIZED' step.
>
> I'll take some of your patches (with Richard's feedback fixed),
> and respin series focusing mostly on realize part.
Great, thank you for helping!
> (and drop wire/unwire parts, reshuffling cpu code instead)
>
>>
>> To do so we split the QTAILQ_INSERT/REMOVE calls from
>> cpu_list_add() / cpu_list_remove(), by moving them to the
>> DeviceClass::[un]wire() handlers, guaranty to be called just
>> before a vCPU is exposed to the guest, as "realized".
>>
>> First we have to modify how &first_cpu is used in TCG round
>> robin implementation, and ensure we invalidate the TB jmpcache
>> with &qemu_cpu_list locked.
>>
>> I'm really out of my comfort zone here, so posting as RFC. At
>> least all test suite is passing...
>>
>> I expect these changes to allow CPUState::cpu_index clarifications
>> and simplifications, but this will be addressed (and commented) in
>> a separate series.
>>
>> Regards,
>>
>> Phil.
^ permalink raw reply [flat|nested] 23+ messages in thread