* [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
@ 2025-01-12 21:26 Phil Dennis-Jordan
2025-01-15 18:01 ` David Woodhouse
2025-01-15 19:05 ` Paolo Bonzini
0 siblings, 2 replies; 5+ messages in thread
From: Phil Dennis-Jordan @ 2025-01-12 21:26 UTC (permalink / raw)
To: qemu-devel; +Cc: phil, pbonzini, philmd, akihiko.odaki, dwmw2
By changing the way the main QEMU event loop is invoked, I inadvertently
changed the BQL status of exit notifiers: some of them implicitly
assumed they would be called with the BQL held; the BQL is however
not held during the exit(status) call in qemu_default_main().
Instead of attempting to ensuring we always call exit() from the BQL -
including any transitive calls - this change adds a BQL lock guard to
qemu_run_exit_notifiers, ensuring the BQL will always be held in the
exit notifiers.
Additionally, the BQL promise is now documented at the
qemu_{add,remove}_exit_notifier() declarations.
Fixes: f5ab12caba4f ("ui & main loop: Redesign of system-specific main
thread event handling")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2771
Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
include/system/system.h | 1 +
system/runstate.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/include/system/system.h b/include/system/system.h
index 5364ad4f27..0cbb43ec30 100644
--- a/include/system/system.h
+++ b/include/system/system.h
@@ -15,6 +15,7 @@ extern bool qemu_uuid_set;
const char *qemu_get_vm_name(void);
+/* Exit notifiers will run with BQL held. */
void qemu_add_exit_notifier(Notifier *notify);
void qemu_remove_exit_notifier(Notifier *notify);
diff --git a/system/runstate.c b/system/runstate.c
index 3a8fe866bc..272801d307 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -850,6 +850,7 @@ void qemu_remove_exit_notifier(Notifier *notify)
static void qemu_run_exit_notifiers(void)
{
+ BQL_LOCK_GUARD();
notifier_list_notify(&exit_notifiers, NULL);
}
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
2025-01-12 21:26 [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers Phil Dennis-Jordan
@ 2025-01-15 18:01 ` David Woodhouse
2025-01-15 19:05 ` Paolo Bonzini
1 sibling, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2025-01-15 18:01 UTC (permalink / raw)
To: Phil Dennis-Jordan, qemu-devel; +Cc: pbonzini, philmd, akihiko.odaki
[-- Attachment #1: Type: text/plain, Size: 1264 bytes --]
On Sun, 2025-01-12 at 22:26 +0100, Phil Dennis-Jordan wrote:
> By changing the way the main QEMU event loop is invoked, I inadvertently
> changed the BQL status of exit notifiers: some of them implicitly
> assumed they would be called with the BQL held; the BQL is however
> not held during the exit(status) call in qemu_default_main().
>
> Instead of attempting to ensuring we always call exit() from the BQL -
> including any transitive calls - this change adds a BQL lock guard to
> qemu_run_exit_notifiers, ensuring the BQL will always be held in the
> exit notifiers.
>
> Additionally, the BQL promise is now documented at the
> qemu_{add,remove}_exit_notifier() declarations.
>
> Fixes: f5ab12caba4f ("ui & main loop: Redesign of system-specific main
> thread event handling")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2771
> Reported-by: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
(Sorry, I thought I'd done that already).
Is someone else going to pick this up, or should I round it up with the
Xen fixes for which I'm likely to send a pull request tomorrow?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
2025-01-12 21:26 [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers Phil Dennis-Jordan
2025-01-15 18:01 ` David Woodhouse
@ 2025-01-15 19:05 ` Paolo Bonzini
2025-01-15 19:17 ` Phil Dennis-Jordan
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2025-01-15 19:05 UTC (permalink / raw)
To: Phil Dennis-Jordan, qemu-devel; +Cc: philmd, akihiko.odaki, dwmw2
On 1/12/25 22:26, Phil Dennis-Jordan wrote:
> By changing the way the main QEMU event loop is invoked, I inadvertently
> changed the BQL status of exit notifiers: some of them implicitly
> assumed they would be called with the BQL held; the BQL is however
> not held during the exit(status) call in qemu_default_main().
>
> Instead of attempting to ensuring we always call exit() from the BQL -
> including any transitive calls - this change adds a BQL lock guard to
> qemu_run_exit_notifiers, ensuring the BQL will always be held in the
> exit notifiers.
>
> Additionally, the BQL promise is now documented at the
> qemu_{add,remove}_exit_notifier() declarations.
>
> Fixes: f5ab12caba4f ("ui & main loop: Redesign of system-specific main
> thread event handling")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2771
> Reported-by: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
I'm worried that this breaks for exit() calls that happen within a
BQL-taken area (for example, anything that uses error_fatal) due to...
void bql_lock_impl(const char *file, int line)
{
QemuMutexLockFunc bql_lock_fn = qatomic_read(&bql_mutex_lock_func);
g_assert(!bql_locked()); // <--- this
bql_lock_fn(&bql, file, line);
set_bql_locked(true);
}
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
2025-01-15 19:05 ` Paolo Bonzini
@ 2025-01-15 19:17 ` Phil Dennis-Jordan
2025-01-16 8:34 ` David Woodhouse
0 siblings, 1 reply; 5+ messages in thread
From: Phil Dennis-Jordan @ 2025-01-15 19:17 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, philmd, akihiko.odaki, dwmw2
[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]
On Wed 15. Jan 2025 at 20:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 1/12/25 22:26, Phil Dennis-Jordan wrote:
> > By changing the way the main QEMU event loop is invoked, I inadvertently
> > changed the BQL status of exit notifiers: some of them implicitly
> > assumed they would be called with the BQL held; the BQL is however
> > not held during the exit(status) call in qemu_default_main().
> >
> > Instead of attempting to ensuring we always call exit() from the BQL -
> > including any transitive calls - this change adds a BQL lock guard to
> > qemu_run_exit_notifiers, ensuring the BQL will always be held in the
> > exit notifiers.
> >
> > Additionally, the BQL promise is now documented at the
> > qemu_{add,remove}_exit_notifier() declarations.
> >
> > Fixes: f5ab12caba4f ("ui & main loop: Redesign of system-specific main
> > thread event handling")
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2771
> > Reported-by: David Woodhouse <dwmw2@infradead.org>
> > Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
>
> I'm worried that this breaks for exit() calls that happen within a
> BQL-taken area (for example, anything that uses error_fatal) due to...
>
> void bql_lock_impl(const char *file, int line)
> {
> QemuMutexLockFunc bql_lock_fn = qatomic_read(&bql_mutex_lock_func);
>
> g_assert(!bql_locked()); // <--- this
> bql_lock_fn(&bql, file, line);
> set_bql_locked(true);
> }
>
BQL_LOCK_GUARD expands to a call to bql_auto_lock(), which in turn defends
against recursive locking by checking bql_locked().
https://gitlab.com/qemu-project/qemu/-/blob/master/include/qemu/main-loop.h#L377
I think that should make it safe?
The only safety issue I can imagine is that exit() is called in a thread
where the BQL is not held, but a BQL-holding thread is waiting for that
thread. But I’m not sure such a pattern exists in QEMU though, and it would
have triggered the assertion in the original code. (before my patch causing
the regression was applied)
>
>
>
[-- Attachment #2: Type: text/html, Size: 3262 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers
2025-01-15 19:17 ` Phil Dennis-Jordan
@ 2025-01-16 8:34 ` David Woodhouse
0 siblings, 0 replies; 5+ messages in thread
From: David Woodhouse @ 2025-01-16 8:34 UTC (permalink / raw)
To: Phil Dennis-Jordan, Paolo Bonzini; +Cc: qemu-devel, philmd, akihiko.odaki
[-- Attachment #1: Type: text/plain, Size: 3282 bytes --]
On Wed, 2025-01-15 at 20:17 +0100, Phil Dennis-Jordan wrote:
>
> BQL_LOCK_GUARD expands to a call to bql_auto_lock(), which in turn
> defends against recursive locking by checking bql_locked().
>
> https://gitlab.com/qemu-project/qemu/-/blob/master/include/qemu/main-loop.h#L377
>
> I think that should make it safe?
Looks like it. I did this to test:
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -451,6 +451,10 @@ void xen_evtchn_set_callback_level(int level)
if (level && !s->extern_gsi_level) {
kvm_xen_set_callback_asserted();
}
+ if (level) {
+ printf("Exiting, BQL held\n");
+ exit(77);
+ }
}
}
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -851,6 +851,7 @@ void qemu_remove_exit_notifier(Notifier *notify)
static void qemu_run_exit_notifiers(void)
{
BQL_LOCK_GUARD();
+ printf("%s has BQL\n", __func__);
notifier_list_notify(&exit_notifiers, NULL);
}
So the first time a Xen guest's callback IRQ is asserted, it exited
with the BQL held, and qemu_run_exit_notifiers() didn't get stuck.
[ 0.521568] ACPI: \_SB_.GSIF: Enabled at IRQ 21
Exiting, BQL held
qemu_run_exit_notifiers has BQL
The actual cleanup of the XenDevice did then deadlock on the Xen evtchn
port_lock, which had *also* been held when my hack exited in the evtchn
code. But that one is expected.
#0 0x00007fc5b2a7b0c0 in __lll_lock_wait () at /lib64/libc.so.6
#1 0x00007fc5b2a81d81 in pthread_mutex_lock@@GLIBC_2.2.5 ()
at /lib64/libc.so.6
#2 0x0000558286c07a63 in qemu_mutex_lock_impl
(mutex=0x558294179998, file=0x558286f9b905 "../hw/i386/kvm/xen_evtchn.c", line=2147) at ../util/qemu-thread-posix.c:95
#3 0x00005582868d774f in xen_be_evtchn_unbind (xc=0x5582939b3810, port=2)
at ../hw/i386/kvm/xen_evtchn.c:2147
#4 0x000055828679e0a9 in qemu_xen_evtchn_unbind
(xc=<optimized out>, port=<optimized out>)
at /home/dwmw2/git/qemu/include/hw/xen/xen_backend_ops.h:91
#5 xen_device_unbind_event_channel
(xendev=<optimized out>, channel=0x5582939b4cb0, errp=0x0)
at ../hw/xen/xen-bus.c:961
#6 0x00005582865f64b9 in xen_console_disconnect
(xendev=xendev@entry=0x5582942df4a0, errp=errp@entry=0x0)
at ../hw/char/xen_console.c:298
#7 0x00005582865f6673 in xen_console_unrealize (xendev=0x5582942df4a0)
at ../hw/char/xen_console.c:411
#8 0x000055828679e201 in xen_device_unrealize (dev=<optimized out>)
at ../hw/xen/xen-bus.c:988
#9 0x0000558286c0da5f in notifier_list_notify (list=<optimized out>, data=0x0)
at ../util/notify.c:39
#10 0x00007fc5b2a2a461 in __run_exit_handlers () at /lib64/libc.so.6
#11 0x00007fc5b2a2a52e in exit () at /lib64/libc.so.6
#12 0x00005582868d86dd in xen_evtchn_set_callback_level (level=1)
at ../hw/i386/kvm/xen_evtchn.c:456
#13 0x00005582868d7c74 in inject_callback
(s=0x558294179650, vcpu=<optimized out>) at ../hw/i386/kvm/xen_evtchn.c:548
#14 do_set_port_compat
(s=<optimized out>, port=<optimized out>, shinfo=<optimized out>, vcpu_info=<optimized out>) at ../hw/i386/kvm/xen_evtchn.c:921
#15 set_port_pending (s=s@entry=0x558294179650, port=<optimized out>)
at ../hw/i386/kvm/xen_evtchn.c:963
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-01-16 8:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-12 21:26 [PATCH] system/runstate: Fix regression, clarify BQL status of exit notifiers Phil Dennis-Jordan
2025-01-15 18:01 ` David Woodhouse
2025-01-15 19:05 ` Paolo Bonzini
2025-01-15 19:17 ` Phil Dennis-Jordan
2025-01-16 8:34 ` David Woodhouse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).