Question about graph locking preconditions regarding qemu_in_main

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Question about graph locking preconditions regarding qemu_in_main_thread()
@ 2023-05-05  9:35 Fiona Ebner
  2023-05-08 10:40 ` Kevin Wolf
  0 siblings, 1 reply; 5+ messages in thread
From: Fiona Ebner @ 2023-05-05  9:35 UTC (permalink / raw)
  To: open list:Network Block Dev...
  Cc: Kevin Wolf, Hanna Czenczek, Paolo Bonzini, QEMU Developers,
	Thomas Lamprecht

Hi,
I noticed that the bdrv_graph_co_rd_lock() and bdrv_graph_co_rd_unlock()
functions use qemu_in_main_thread() as a conditional to return early.
What high-level requirements ensure that qemu_in_main_thread() will
evaluate to the same value during locking and unlocking?

This paragraph assumes no iothreads are used for simplicity. One
requirement is: don't call bdrv_* functions without the BQL. Well,
snapshot-save does just that during setup, but it might be fine, because
it's done after calling vm_stop() and within a bdrv_drain_all section
(there is another issue however [0], so it still makes sense to change
snapshot-save to hold the BQL during setup).

But a variation without the vm_stop() and drained section will be able
to produce a negative reader count, see below[1][2]. The issue obviously
depends on dropping the BQL, but I'm not entirely sure it depends on the
bdrv_* call or if steps 3. and 4. in [2] could also happen in some other
scenario.

I'm not aware of any actual issues in QEMU :) But I thought, I'd better ask.

Best Regards,
Fiona

[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg05415.html

[1]: Example QMP function inspired by snapshot-save

> void qmp_unlock_write_lock(Error **errp)
> {
>     Error *local_err = NULL;
>     QDict *options = qdict_new();
>     const uint8_t *buf = malloc(1000);
> 
>     /*
>     vm_stop(RUN_STATE_SAVE_VM);
>     bdrv_drain_all_begin();
>     */
> 
>     qdict_put_str(options, "driver", "qcow2");
>     BlockBackend *bb = blk_new_open("/tmp/disk.qcow2", NULL, options, BDRV_O_RDWR, &local_err);
>     if (!bb) {
>         error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open");
>     } else {
>         qemu_mutex_unlock_iothread();
>         bdrv_save_vmstate(blk_bs(bb), buf, 0, 1000);
>         qemu_mutex_lock_iothread();
>         blk_unref(bb);
>     }
> 
>     /*
>     bdrv_drain_all_end();
>     vm_start();
>     */
> }

[2]:

In the output below, the boolean value after the backtraces of
bdrv_graph_co_rd(un)lock is the value of qemu_in_main_thread().

AFAICT, what happened below is:
1. QMP function is executed in the main thread and drops the BQL.
2. bdrv_co_writev_vmstate_entry is called, increasing the reader count,
because qemu_in_main_thread() is false.
3. A vCPU thread issued a write, not increasing the reader count,
because qemu_in_main_thread() is true.
4. The write is finished in the main thread, decreasing the reader
count, because qemu_in_main_thread() is false.
5. bdrv_co_writev_vmstate_entry is finished in the main thread,
decreasing the reader count, because qemu_in_main_thread() is false.
6. The assertion that the reader count is non-negative fails (during
blk_unref(), bdrv_graph_wrlock() is called which triggers the assert,
not shown below).

> Thread 1 "qemu-system-x86" hit Breakpoint 3, qmp_unlock_write_lock (errp=0x7fffffffd740) at ../migration/savevm.c:3410
> 3410            qemu_mutex_unlock_iothread();
> 
> Thread 1 "qemu-system-x86" hit Breakpoint 5, bdrv_graph_co_rdlock () at ../block/graph-lock.c:161
> 161     {
> #0  bdrv_graph_co_rdlock () at ../block/graph-lock.c:161
> #1  0x0000555555e8e4f3 in bdrv_co_writev_vmstate_entry (opaque=0x7fffffffd600) at block/block-gen.c:784
> #2  0x0000555556053c15 in coroutine_trampoline (i0=1457248608, i1=21845) at ../util/coroutine-ucontext.c:177
> #3  0x00007ffff6169d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007fffffffc950 in ?? ()
> #5  0x0000000000000000 in ?? ()
> $51 = false
> [Switching to Thread 0x7ffff0a47700 (LWP 54187)]
> 
> Thread 7 "CPU 0/KVM" hit Breakpoint 5, bdrv_graph_co_rdlock () at ../block/graph-lock.c:161
> 161     {
> #0  bdrv_graph_co_rdlock () at ../block/graph-lock.c:161
> #1  0x0000555555ebf2e3 in graph_lockable_auto_lock (x=0x7fff4bfffdf3) at /home/febner/repos/qemu/include/block/graph-lock.h:214
> #2  0x0000555555ec20d6 in blk_co_do_pwritev_part (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, qiov_offset=0, flags=0) at ../block/block-backend.c:1367
> #3  0x0000555555ec2234 in blk_co_pwritev_part (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, qiov_offset=0, flags=0) at ../block/block-backend.c:1404
> #4  0x0000555555ec2311 in blk_co_pwritev (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, flags=0) at ../block/block-backend.c:1426
> #5  0x0000555555ec22bb in blk_co_pwrite (blk=0x555556dac400, offset=145920, bytes=512, buf=0x7fff5ae23a00, flags=0) at ../block/block-backend.c:1418
> #6  0x0000555555e8fc60 in blk_co_pwrite_entry (opaque=0x7ffff0a41eb0) at block/block-gen.c:1624
> #7  0x0000555556053c15 in coroutine_trampoline (i0=-535577088, i1=32767) at ../util/coroutine-ucontext.c:177
> #8  0x00007ffff6169d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #9  0x00007ffff0a41690 in ?? ()
> #10 0x0000000000000000 in ?? ()
> $52 = true
> [Switching to Thread 0x7ffff3901280 (LWP 54113)]
> 
> Thread 1 "qemu-system-x86" hit Breakpoint 6, bdrv_graph_co_rdunlock () at ../block/graph-lock.c:231
> 231         bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
> #0  bdrv_graph_co_rdunlock () at ../block/graph-lock.c:231
> #1  0x0000555555ebf2fa in graph_lockable_auto_unlock (x=0x7fff4bfffdf3) at /home/febner/repos/qemu/include/block/graph-lock.h:221
> #2  0x0000555555ebf31c in glib_autoptr_clear_GraphLockable (_ptr=0x7fff4bfffdf3) at /home/febner/repos/qemu/include/block/graph-lock.h:224
> #3  0x0000555555ebf33a in glib_autoptr_cleanup_GraphLockable (_ptr=0x7fff4bfffdf8) at /home/febner/repos/qemu/include/block/graph-lock.h:224
> #4  0x0000555555ec21c6 in blk_co_do_pwritev_part (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, qiov_offset=0, flags=0) at ../block/block-backend.c:1367
> #5  0x0000555555ec2234 in blk_co_pwritev_part (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, qiov_offset=0, flags=0) at ../block/block-backend.c:1404
> #6  0x0000555555ec2311 in blk_co_pwritev (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, flags=0) at ../block/block-backend.c:1426
> #7  0x0000555555ec22bb in blk_co_pwrite (blk=0x555556dac400, offset=145920, bytes=512, buf=0x7fff5ae23a00, flags=0) at ../block/block-backend.c:1418
> #8  0x0000555555e8fc60 in blk_co_pwrite_entry (opaque=0x7ffff0a41eb0) at block/block-gen.c:1624
> #9  0x0000555556053c15 in coroutine_trampoline (i0=-535577088, i1=32767) at ../util/coroutine-ucontext.c:177
> #10 0x00007ffff6169d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #11 0x00007ffff0a41690 in ?? ()
> #12 0x0000000000000000 in ?? ()
> $53 = false
> 
> Thread 1 "qemu-system-x86" hit Breakpoint 6, bdrv_graph_co_rdunlock () at ../block/graph-lock.c:231
> 231         bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
> #0  bdrv_graph_co_rdunlock () at ../block/graph-lock.c:231
> #1  0x0000555555e8e522 in bdrv_co_writev_vmstate_entry (opaque=0x7fffffffd600) at block/block-gen.c:786
> #2  0x0000555556053c15 in coroutine_trampoline (i0=1457248608, i1=21845) at ../util/coroutine-ucontext.c:177
> #3  0x00007ffff6169d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007fffffffc950 in ?? ()
> #5  0x0000000000000000 in ?? ()
> $54 = false
> 
> Thread 1 "qemu-system-x86" hit Breakpoint 4, qmp_unlock_write_lock (errp=0x7fffffffd740) at ../migration/savevm.c:3412
> 3412            qemu_mutex_lock_iothread();
> qemu-system-x86_64: ../block/graph-lock.c:105: reader_count: Assertion `(int32_t)rd >= 0' failed.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about graph locking preconditions regarding qemu_in_main_thread()
  2023-05-05  9:35 Question about graph locking preconditions regarding qemu_in_main_thread() Fiona Ebner
@ 2023-05-08 10:40 ` Kevin Wolf
  2023-05-09 10:26   ` Fiona Ebner
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2023-05-08 10:40 UTC (permalink / raw)
  To: Fiona Ebner
  Cc: open list:Network Block Dev..., Hanna Czenczek, eesposit,
	Paolo Bonzini, QEMU Developers, Thomas Lamprecht

Am 05.05.2023 um 11:35 hat Fiona Ebner geschrieben:
> Hi,
> I noticed that the bdrv_graph_co_rd_lock() and bdrv_graph_co_rd_unlock()
> functions use qemu_in_main_thread() as a conditional to return early.
> What high-level requirements ensure that qemu_in_main_thread() will
> evaluate to the same value during locking and unlocking?

I think it's actually wrong.

There are some conditions under which we don't actually need to do
anything for locking: Writing the graph is only possible from the main
context while holding the BQL. So if a reader is running in the main
contextunder the BQL and knows that it won't be interrupted until the
next writer runs, we don't actually need to do anything.

This is the case if the reader code neither has a nested event loop
(this is forbidden anyway while you hold the lock) nor is a coroutine
(because a writer could run when the coroutine has yielded).

These conditions are exactly what bdrv_graph_rdlock_main_loop() asserts.
They are not fulfilled in bdrv_graph_co_rdlock, which always runs in a
coroutine.

> This paragraph assumes no iothreads are used for simplicity. One
> requirement is: don't call bdrv_* functions without the BQL. Well,
> snapshot-save does just that during setup, but it might be fine,
> because it's done after calling vm_stop() and within a bdrv_drain_all
> section (there is another issue however [0], so it still makes sense
> to change snapshot-save to hold the BQL during setup).

Hm, maybe it happens to be safe if all other threads that could take the
BQL are idle. After vm_stop(), the most important ones - the vcpu
threads - shouldn't cause trouble any more. Snapshots don't work during
a migration, so a migration thread might actually not be a problem in
this case either. I'm less sure about any worker threads, etc.

The easy answer is: We shouldn't violate the rule in the first place and
just take the BQL.

> But a variation without the vm_stop() and drained section will be able
> to produce a negative reader count, see below[1][2]. The issue
> obviously depends on dropping the BQL, but I'm not entirely sure it
> depends on the bdrv_* call or if steps 3. and 4. in [2] could also
> happen in some other scenario.

Yes, if your locking is wrong and correct operation relies one some
specific circumstances, then it's easy to accidentally turn it into a
real bug by calling the same function in a slightly different
environment.

> I'm not aware of any actual issues in QEMU :) But I thought, I'd
> better ask.
> 
> Best Regards,
> Fiona
> 
> [0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg05415.html
> 
> [1]: Example QMP function inspired by snapshot-save
> 
> > void qmp_unlock_write_lock(Error **errp)
> > {
> >     Error *local_err = NULL;
> >     QDict *options = qdict_new();
> >     const uint8_t *buf = malloc(1000);
> > 
> >     /*
> >     vm_stop(RUN_STATE_SAVE_VM);
> >     bdrv_drain_all_begin();
> >     */
> > 
> >     qdict_put_str(options, "driver", "qcow2");
> >     BlockBackend *bb = blk_new_open("/tmp/disk.qcow2", NULL, options, BDRV_O_RDWR, &local_err);
> >     if (!bb) {
> >         error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open");
> >     } else {
> >         qemu_mutex_unlock_iothread();
> >         bdrv_save_vmstate(blk_bs(bb), buf, 0, 1000);
> >         qemu_mutex_lock_iothread();
> >         blk_unref(bb);
> >     }
> > 
> >     /*
> >     bdrv_drain_all_end();
> >     vm_start();
> >     */
> > }
> 
> [2]:
> 
> In the output below, the boolean value after the backtraces of
> bdrv_graph_co_rd(un)lock is the value of qemu_in_main_thread().
> 
> AFAICT, what happened below is:
> 1. QMP function is executed in the main thread and drops the BQL.
> 2. bdrv_co_writev_vmstate_entry is called, increasing the reader count,
> because qemu_in_main_thread() is false.
> 3. A vCPU thread issued a write, not increasing the reader count,
> because qemu_in_main_thread() is true.
> 4. The write is finished in the main thread, decreasing the reader
> count, because qemu_in_main_thread() is false.

This one is weird. Why don't we hold the BQL there?

Ah, I actually have an idea: Maybe it is executed by the nested event
loop in bdrv_writev_vmstate(). This nested event loop runs now without
the BQL, too, and it can call any sort of callback in QEMU that really
expects to be called with the BQL. Scary!

If this is what happens, you actually have your proof that not holding
the BQL can cause problems even if there is no other thread active that
could interfere.

Can you try to confirm this in gdb? In case you haven't debugged
coroutines much yet: If you have this state in gdb for a running
instance, you can repeat 'fin' until you reach the next coroutine
switch, and then you should be able to get the stack trace to see who
entered the coroutine.

> 5. bdrv_co_writev_vmstate_entry is finished in the main thread,
> decreasing the reader count, because qemu_in_main_thread() is false.
> 6. The assertion that the reader count is non-negative fails (during
> blk_unref(), bdrv_graph_wrlock() is called which triggers the assert,
> not shown below).

Kevin

> > Thread 1 "qemu-system-x86" hit Breakpoint 3, qmp_unlock_write_lock (errp=0x7fffffffd740) at ../migration/savevm.c:3410
> > 3410            qemu_mutex_unlock_iothread();
> > 
> > Thread 1 "qemu-system-x86" hit Breakpoint 5, bdrv_graph_co_rdlock () at ../block/graph-lock.c:161
> > 161     {
> > #0  bdrv_graph_co_rdlock () at ../block/graph-lock.c:161
> > #1  0x0000555555e8e4f3 in bdrv_co_writev_vmstate_entry (opaque=0x7fffffffd600) at block/block-gen.c:784
> > #2  0x0000555556053c15 in coroutine_trampoline (i0=1457248608, i1=21845) at ../util/coroutine-ucontext.c:177
> > #3  0x00007ffff6169d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #4  0x00007fffffffc950 in ?? ()
> > #5  0x0000000000000000 in ?? ()
> > $51 = false
> > [Switching to Thread 0x7ffff0a47700 (LWP 54187)]
> > 
> > Thread 7 "CPU 0/KVM" hit Breakpoint 5, bdrv_graph_co_rdlock () at ../block/graph-lock.c:161
> > 161     {
> > #0  bdrv_graph_co_rdlock () at ../block/graph-lock.c:161
> > #1  0x0000555555ebf2e3 in graph_lockable_auto_lock (x=0x7fff4bfffdf3) at /home/febner/repos/qemu/include/block/graph-lock.h:214
> > #2  0x0000555555ec20d6 in blk_co_do_pwritev_part (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, qiov_offset=0, flags=0) at ../block/block-backend.c:1367
> > #3  0x0000555555ec2234 in blk_co_pwritev_part (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, qiov_offset=0, flags=0) at ../block/block-backend.c:1404
> > #4  0x0000555555ec2311 in blk_co_pwritev (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, flags=0) at ../block/block-backend.c:1426
> > #5  0x0000555555ec22bb in blk_co_pwrite (blk=0x555556dac400, offset=145920, bytes=512, buf=0x7fff5ae23a00, flags=0) at ../block/block-backend.c:1418
> > #6  0x0000555555e8fc60 in blk_co_pwrite_entry (opaque=0x7ffff0a41eb0) at block/block-gen.c:1624
> > #7  0x0000555556053c15 in coroutine_trampoline (i0=-535577088, i1=32767) at ../util/coroutine-ucontext.c:177
> > #8  0x00007ffff6169d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #9  0x00007ffff0a41690 in ?? ()
> > #10 0x0000000000000000 in ?? ()
> > $52 = true
> > [Switching to Thread 0x7ffff3901280 (LWP 54113)]
> > 
> > Thread 1 "qemu-system-x86" hit Breakpoint 6, bdrv_graph_co_rdunlock () at ../block/graph-lock.c:231
> > 231         bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
> > #0  bdrv_graph_co_rdunlock () at ../block/graph-lock.c:231
> > #1  0x0000555555ebf2fa in graph_lockable_auto_unlock (x=0x7fff4bfffdf3) at /home/febner/repos/qemu/include/block/graph-lock.h:221
> > #2  0x0000555555ebf31c in glib_autoptr_clear_GraphLockable (_ptr=0x7fff4bfffdf3) at /home/febner/repos/qemu/include/block/graph-lock.h:224
> > #3  0x0000555555ebf33a in glib_autoptr_cleanup_GraphLockable (_ptr=0x7fff4bfffdf8) at /home/febner/repos/qemu/include/block/graph-lock.h:224
> > #4  0x0000555555ec21c6 in blk_co_do_pwritev_part (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, qiov_offset=0, flags=0) at ../block/block-backend.c:1367
> > #5  0x0000555555ec2234 in blk_co_pwritev_part (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, qiov_offset=0, flags=0) at ../block/block-backend.c:1404
> > #6  0x0000555555ec2311 in blk_co_pwritev (blk=0x555556dac400, offset=145920, bytes=512, qiov=0x7fff4bfffef0, flags=0) at ../block/block-backend.c:1426
> > #7  0x0000555555ec22bb in blk_co_pwrite (blk=0x555556dac400, offset=145920, bytes=512, buf=0x7fff5ae23a00, flags=0) at ../block/block-backend.c:1418
> > #8  0x0000555555e8fc60 in blk_co_pwrite_entry (opaque=0x7ffff0a41eb0) at block/block-gen.c:1624
> > #9  0x0000555556053c15 in coroutine_trampoline (i0=-535577088, i1=32767) at ../util/coroutine-ucontext.c:177
> > #10 0x00007ffff6169d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #11 0x00007ffff0a41690 in ?? ()
> > #12 0x0000000000000000 in ?? ()
> > $53 = false
> > 
> > Thread 1 "qemu-system-x86" hit Breakpoint 6, bdrv_graph_co_rdunlock () at ../block/graph-lock.c:231
> > 231         bdrv_graph = qemu_get_current_aio_context()->bdrv_graph;
> > #0  bdrv_graph_co_rdunlock () at ../block/graph-lock.c:231
> > #1  0x0000555555e8e522 in bdrv_co_writev_vmstate_entry (opaque=0x7fffffffd600) at block/block-gen.c:786
> > #2  0x0000555556053c15 in coroutine_trampoline (i0=1457248608, i1=21845) at ../util/coroutine-ucontext.c:177
> > #3  0x00007ffff6169d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #4  0x00007fffffffc950 in ?? ()
> > #5  0x0000000000000000 in ?? ()
> > $54 = false
> > 
> > Thread 1 "qemu-system-x86" hit Breakpoint 4, qmp_unlock_write_lock (errp=0x7fffffffd740) at ../migration/savevm.c:3412
> > 3412            qemu_mutex_lock_iothread();
> > qemu-system-x86_64: ../block/graph-lock.c:105: reader_count: Assertion `(int32_t)rd >= 0' failed.
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about graph locking preconditions regarding qemu_in_main_thread()
  2023-05-08 10:40 ` Kevin Wolf
@ 2023-05-09 10:26   ` Fiona Ebner
  2023-05-09 13:53     ` Kevin Wolf
  0 siblings, 1 reply; 5+ messages in thread
From: Fiona Ebner @ 2023-05-09 10:26 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: open list:Network Block Dev..., Hanna Czenczek, eesposit,
	Paolo Bonzini, QEMU Developers, Thomas Lamprecht

Am 08.05.23 um 12:40 schrieb Kevin Wolf:
> Am 05.05.2023 um 11:35 hat Fiona Ebner geschrieben:
>> Hi,
>> I noticed that the bdrv_graph_co_rd_lock() and bdrv_graph_co_rd_unlock()
>> functions use qemu_in_main_thread() as a conditional to return early.
>> What high-level requirements ensure that qemu_in_main_thread() will
>> evaluate to the same value during locking and unlocking?
> 
> I think it's actually wrong.
> 
> There are some conditions under which we don't actually need to do
> anything for locking: Writing the graph is only possible from the main
> context while holding the BQL. So if a reader is running in the main
> contextunder the BQL and knows that it won't be interrupted until the
> next writer runs, we don't actually need to do anything.
> 
> This is the case if the reader code neither has a nested event loop
> (this is forbidden anyway while you hold the lock) nor is a coroutine
> (because a writer could run when the coroutine has yielded).

Sorry, I don't understand the first part. If bdrv_writev_vmstate() is
called, then, because it is a generated coroutine wrapper,
AIO_WAIT_WHILE()/aio_poll() is used. And that is the case regardless of
whether the lock is held or not, i.e. there is a nested event loop even
if the lock is held?

> These conditions are exactly what bdrv_graph_rdlock_main_loop() asserts.
> They are not fulfilled in bdrv_graph_co_rdlock, which always runs in a
> coroutine.
> 

Thank you for the explanation!

>> In the output below, the boolean value after the backtraces of
>> bdrv_graph_co_rd(un)lock is the value of qemu_in_main_thread().
>>
>> AFAICT, what happened below is:
>> 1. QMP function is executed in the main thread and drops the BQL.
>> 2. bdrv_co_writev_vmstate_entry is called, increasing the reader count,
>> because qemu_in_main_thread() is false.
>> 3. A vCPU thread issued a write, not increasing the reader count,
>> because qemu_in_main_thread() is true.
>> 4. The write is finished in the main thread, decreasing the reader
>> count, because qemu_in_main_thread() is false.
> 
> This one is weird. Why don't we hold the BQL there?
> 
> Ah, I actually have an idea: Maybe it is executed by the nested event
> loop in bdrv_writev_vmstate(). This nested event loop runs now without
> the BQL, too, and it can call any sort of callback in QEMU that really
> expects to be called with the BQL. Scary!
> 
> If this is what happens, you actually have your proof that not holding
> the BQL can cause problems even if there is no other thread active that
> could interfere.
> 
> Can you try to confirm this in gdb? In case you haven't debugged
> coroutines much yet: If you have this state in gdb for a running
> instance, you can repeat 'fin' until you reach the next coroutine
> switch, and then you should be able to get the stack trace to see who
> entered the coroutine.
> 

Thank you for the guidance! Hope I did it correctly, but AFAICS, you are
spot-on :)

> Run till exit from #0  blk_co_pwrite_entry (opaque=0x7ffff1242eb0) at block/block-gen.c:1624
> coroutine_trampoline (i0=1543776144, i1=32767) at ../util/coroutine-ucontext.c:178
> 178	        qemu_coroutine_switch(co, co->caller, COROUTINE_TERMINATE);
> (gdb) s
> [New Thread 0x7fff537f7700 (LWP 128641)]
> [Thread 0x7fff537f7700 (LWP 128641) exited]
> qemu_coroutine_switch (from_=0x7fff5c042790, to_=0x7ffff3901148, action=COROUTINE_TERMINATE) at ../util/coroutine-ucontext.c:298
> 298	{
> (gdb) n
> 299	    CoroutineUContext *from = DO_UPCAST(CoroutineUContext, base, from_);
> (gdb) 
> 300	    CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, to_);
> (gdb) 
> 302	    void *fake_stack_save = NULL;
> (gdb) 
> [New Thread 0x7fff537f7700 (LWP 128660)]
> 304	    set_current(to_);
> (gdb) 
> [Thread 0x7fff537f7700 (LWP 128660) exited]
> 306	    ret = sigsetjmp(from->env, 0);
> (gdb) n
> [New Thread 0x7fff537f7700 (LWP 128762)]
> [Thread 0x7fff537f7700 (LWP 128762) exited]
> 307	    if (ret == 0) {
> (gdb) 
> 308	        start_switch_fiber_asan(action, &fake_stack_save, to->stack,
> (gdb) 
> 310	        start_switch_fiber_tsan(&fake_stack_save,
> (gdb) 
> [New Thread 0x7fff537f7700 (LWP 128868)]
> 312	        siglongjmp(to->env, action);
> (gdb) s
> [Thread 0x7fff537f7700 (LWP 128868) exited]
> longjmp_compat (env=0x7ffff39011b0, val=2) at ../sysdeps/x86/nptl/pt-longjmp.c:62
> 62	../sysdeps/x86/nptl/pt-longjmp.c: No such file or directory.
> (gdb) 
> __libc_siglongjmp (env=0x7fff5c0c07e8, val=3) at ../setjmp/longjmp.c:30
> 30	../setjmp/longjmp.c: No such file or directory.
> (gdb) 
> _longjmp_unwind (env=0x7fff5c0c07e8, val=3) at ../sysdeps/nptl/jmp-unwind.c:30
> 30	../sysdeps/nptl/jmp-unwind.c: No such file or directory.
> (gdb) bt
> #0  _longjmp_unwind (env=0x7fff5c0c07e8, val=3) at ../sysdeps/nptl/jmp-unwind.c:30
> #1  0x00007ffff6154961 in __libc_siglongjmp (env=0x7fff5c0c07e8, val=3) at ../setjmp/longjmp.c:30
> #2  0x00007ffff712cc59 in longjmp_compat (env=<optimized out>, val=<optimized out>) at ../sysdeps/x86/nptl/pt-longjmp.c:62
> #3  0x000055555605409b in qemu_coroutine_switch (from_=0x7ffff3901148, to_=0x7fff5c0c0780, action=COROUTINE_ENTER) at ../util/coroutine-ucontext.c:312
> #4  0x0000555556052214 in qemu_aio_coroutine_enter (ctx=0x555556b40f10, co=0x7fff5c0c0780) at ../util/qemu-coroutine.c:161
> #5  0x0000555556050ae3 in aio_co_enter (ctx=0x555556b40f10, co=0x7fff5c0c0780) at ../util/async.c:680
> #6  0x0000555556050a26 in aio_co_wake (co=0x7fff5c0c0780) at ../util/async.c:664
> #7  0x0000555556054ea9 in thread_pool_co_cb (opaque=0x7fff516dd880, ret=0) at ../util/thread-pool.c:284
> #8  0x0000555556054acb in thread_pool_completion_bh (opaque=0x555556b39c00) at ../util/thread-pool.c:199
> #9  0x000055555604fa34 in aio_bh_call (bh=0x555556dcca30) at ../util/async.c:155
> #10 0x000055555604fb3f in aio_bh_poll (ctx=0x555556b40f10) at ../util/async.c:184
> #11 0x0000555556033ce2 in aio_poll (ctx=0x555556b40f10, blocking=true) at ../util/aio-posix.c:721
> #12 0x0000555555e8d055 in bdrv_poll_co (s=0x7fffffffd610) at /home/febner/repos/qemu/block/block-gen.h:43
> #13 0x0000555555e8e606 in bdrv_writev_vmstate (bs=0x555557a6aed0, qiov=0x7fffffffd690, pos=0) at block/block-gen.c:809
> #14 0x0000555555ed75e3 in bdrv_save_vmstate (bs=0x555557a6aed0, buf=0x555556dc5000 "\020t\317VUU", pos=0, size=1000) at ../block/io.c:2786
> #15 0x0000555555b77528 in qmp_unlock_write_lock (errp=0x7fffffffd750) at ../migration/savevm.c:3411
> #16 0x0000555555fde0b8 in qmp_marshal_unlock_write_lock (args=0x7fffe8007e60, ret=0x7ffff2e94d98, errp=0x7ffff2e94d90) at qapi/qapi-commands-migration.c:1468
> #17 0x00005555560263b1 in do_qmp_dispatch_bh (opaque=0x7ffff2e94e30) at ../qapi/qmp-dispatch.c:128
> #18 0x000055555604fa34 in aio_bh_call (bh=0x7fffe8005e90) at ../util/async.c:155
> #19 0x000055555604fb3f in aio_bh_poll (ctx=0x555556b40f10) at ../util/async.c:184
> #20 0x0000555556033399 in aio_dispatch (ctx=0x555556b40f10) at ../util/aio-posix.c:421
> #21 0x000055555604ff7a in aio_ctx_dispatch (source=0x555556b40f10, callback=0x0, user_data=0x0) at ../util/async.c:326
> #22 0x00007ffff7544e6b in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> #23 0x00005555560516c0 in glib_pollfds_poll () at ../util/main-loop.c:290
> #24 0x000055555605173a in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:313
> #25 0x000055555605183f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> #26 0x0000555555b2c188 in qemu_main_loop () at ../softmmu/runstate.c:731
> #27 0x000055555586ea92 in qemu_default_main () at ../softmmu/main.c:37
> #28 0x000055555586eac8 in main (argc=60, argv=0x7fffffffdb28) at ../softmmu/main.c:48

Best Regards,
Fiona



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about graph locking preconditions regarding qemu_in_main_thread()
  2023-05-09 10:26   ` Fiona Ebner
@ 2023-05-09 13:53     ` Kevin Wolf
  2023-05-09 14:20       ` Fiona Ebner
  0 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2023-05-09 13:53 UTC (permalink / raw)
  To: Fiona Ebner
  Cc: open list:Network Block Dev..., Hanna Czenczek, eesposit,
	Paolo Bonzini, QEMU Developers, Thomas Lamprecht

Am 09.05.2023 um 12:26 hat Fiona Ebner geschrieben:
> Am 08.05.23 um 12:40 schrieb Kevin Wolf:
> > Am 05.05.2023 um 11:35 hat Fiona Ebner geschrieben:
> >> Hi,
> >> I noticed that the bdrv_graph_co_rd_lock() and bdrv_graph_co_rd_unlock()
> >> functions use qemu_in_main_thread() as a conditional to return early.
> >> What high-level requirements ensure that qemu_in_main_thread() will
> >> evaluate to the same value during locking and unlocking?
> > 
> > I think it's actually wrong.
> > 
> > There are some conditions under which we don't actually need to do
> > anything for locking: Writing the graph is only possible from the main
> > context while holding the BQL. So if a reader is running in the main
> > contextunder the BQL and knows that it won't be interrupted until the
> > next writer runs, we don't actually need to do anything.
> > 
> > This is the case if the reader code neither has a nested event loop
> > (this is forbidden anyway while you hold the lock) nor is a coroutine
> > (because a writer could run when the coroutine has yielded).
> 
> Sorry, I don't understand the first part. If bdrv_writev_vmstate() is
> called, then, because it is a generated coroutine wrapper,
> AIO_WAIT_WHILE()/aio_poll() is used. And that is the case regardless of
> whether the lock is held or not, i.e. there is a nested event loop even
> if the lock is held?

With "lock" you mean the graph lock here, not the BQL, right?

These generated coroutine wrapper are a bit ugly because they behave
different when called from a coroutine and when called outside of
coroutine context:

* In coroutine context, the caller MUST hold the lock
* Outside of coroutine context, the caller MUST NOT hold the lock

The second requirement comes from AIO_WAIT_WHILE(), like you say. If you
hold the lock and you're not in a coroutine, you simply can't call such
functions.

However, as bdrv_graph_co_rdlock() is a coroutine_fn, you won't usually
hold the lock outside of coroutine context anyway. But it's something to
be careful with when you have a writer lock.

> > These conditions are exactly what bdrv_graph_rdlock_main_loop() asserts.
> > They are not fulfilled in bdrv_graph_co_rdlock, which always runs in a
> > coroutine.
> > 
> 
> Thank you for the explanation!
> 
> >> In the output below, the boolean value after the backtraces of
> >> bdrv_graph_co_rd(un)lock is the value of qemu_in_main_thread().
> >>
> >> AFAICT, what happened below is:
> >> 1. QMP function is executed in the main thread and drops the BQL.
> >> 2. bdrv_co_writev_vmstate_entry is called, increasing the reader count,
> >> because qemu_in_main_thread() is false.
> >> 3. A vCPU thread issued a write, not increasing the reader count,
> >> because qemu_in_main_thread() is true.
> >> 4. The write is finished in the main thread, decreasing the reader
> >> count, because qemu_in_main_thread() is false.
> > 
> > This one is weird. Why don't we hold the BQL there?
> > 
> > Ah, I actually have an idea: Maybe it is executed by the nested event
> > loop in bdrv_writev_vmstate(). This nested event loop runs now without
> > the BQL, too, and it can call any sort of callback in QEMU that really
> > expects to be called with the BQL. Scary!
> > 
> > If this is what happens, you actually have your proof that not holding
> > the BQL can cause problems even if there is no other thread active that
> > could interfere.
> > 
> > Can you try to confirm this in gdb? In case you haven't debugged
> > coroutines much yet: If you have this state in gdb for a running
> > instance, you can repeat 'fin' until you reach the next coroutine
> > switch, and then you should be able to get the stack trace to see who
> > entered the coroutine.
> > 
> 
> Thank you for the guidance! Hope I did it correctly, but AFAICS, you
> are spot-on :)

Yes, this looks like what I suspected. Thanks for confirming!

Kevin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question about graph locking preconditions regarding qemu_in_main_thread()
  2023-05-09 13:53     ` Kevin Wolf
@ 2023-05-09 14:20       ` Fiona Ebner
  0 siblings, 0 replies; 5+ messages in thread
From: Fiona Ebner @ 2023-05-09 14:20 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: open list:Network Block Dev..., Hanna Czenczek, eesposit,
	Paolo Bonzini, QEMU Developers, Thomas Lamprecht

Am 09.05.23 um 15:53 schrieb Kevin Wolf:
> Am 09.05.2023 um 12:26 hat Fiona Ebner geschrieben:
>> Am 08.05.23 um 12:40 schrieb Kevin Wolf:
>>> Am 05.05.2023 um 11:35 hat Fiona Ebner geschrieben:
>>>> Hi,
>>>> I noticed that the bdrv_graph_co_rd_lock() and bdrv_graph_co_rd_unlock()
>>>> functions use qemu_in_main_thread() as a conditional to return early.
>>>> What high-level requirements ensure that qemu_in_main_thread() will
>>>> evaluate to the same value during locking and unlocking?
>>>
>>> I think it's actually wrong.
>>>
>>> There are some conditions under which we don't actually need to do
>>> anything for locking: Writing the graph is only possible from the main
>>> context while holding the BQL. So if a reader is running in the main
>>> contextunder the BQL and knows that it won't be interrupted until the
>>> next writer runs, we don't actually need to do anything.
>>>
>>> This is the case if the reader code neither has a nested event loop
>>> (this is forbidden anyway while you hold the lock) nor is a coroutine
>>> (because a writer could run when the coroutine has yielded).
>>
>> Sorry, I don't understand the first part. If bdrv_writev_vmstate() is
>> called, then, because it is a generated coroutine wrapper,
>> AIO_WAIT_WHILE()/aio_poll() is used. And that is the case regardless of
>> whether the lock is held or not, i.e. there is a nested event loop even
>> if the lock is held?
> 
> With "lock" you mean the graph lock here, not the BQL, right?

Oh, I actually meant the BQL. Since your "lock" refers to the graph
lock, that explains my confusion :)

> 
> These generated coroutine wrapper are a bit ugly because they behave
> different when called from a coroutine and when called outside of
> coroutine context:
> 
> * In coroutine context, the caller MUST hold the lock
> * Outside of coroutine context, the caller MUST NOT hold the lock
> 
> The second requirement comes from AIO_WAIT_WHILE(), like you say. If you
> hold the lock and you're not in a coroutine, you simply can't call such
> functions.
> 
> However, as bdrv_graph_co_rdlock() is a coroutine_fn, you won't usually
> hold the lock outside of coroutine context anyway. But it's something to
> be careful with when you have a writer lock.
> 

Best Regards,
Fiona



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-09 14:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-05  9:35 Question about graph locking preconditions regarding qemu_in_main_thread() Fiona Ebner
2023-05-08 10:40 ` Kevin Wolf
2023-05-09 10:26   ` Fiona Ebner
2023-05-09 13:53     ` Kevin Wolf
2023-05-09 14:20       ` Fiona Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).