* QMP (without OOB) function running in thread different from the main thread as part of aio_poll
@ 2023-04-19 14:09 Fiona Ebner
2023-04-19 16:59 ` Paolo Bonzini
2023-04-20 6:11 ` Markus Armbruster
0 siblings, 2 replies; 21+ messages in thread
From: Fiona Ebner @ 2023-04-19 14:09 UTC (permalink / raw)
To: QEMU Developers
Cc: open list:Network Block Dev..., michael.roth, Markus Armbruster,
Fam Zheng, Stefan Hajnoczi, Paolo Bonzini, Thomas Lamprecht
Hi,
while debugging a completely different issue, I was surprised to see
do_qmp_dispatch_bh being run in a vCPU thread. I was under the
impression that QMP functions are supposed to be executed in the main
thread. Is that wrong?
I managed to reproduced the scenario with a build of upstream QEMU
v8.0.0-rc4 once more (again with GDB open), but it's not a simple
reproducer and racy. The backtrace is below[0].
My attempt at explaining the situation is:
1. In qapi/qmp-dispatch.c, the main thread schedules do_qmp_dispatch_bh,
because the coroutine context doesn't match.
2. The host OS switches to the vCPU thread.
3. Something in the vCPU thread triggers aio_poll with the main thread's
AioContext (in my case, a write to a pflash drive).
4. do_qmp_dispatch_bh is executed in the vCPU thread.
Could this be an artifact of running with a debugger?
I CC'ed the maintainers of util/aio-posix.c and qapi/qmp-dispatch.c
hoping that is not too far off.
Best Regards,
Fiona
[0]:
> Thread 5 "CPU 0/KVM" hit Breakpoint 2, do_qmp_dispatch_bh (opaque=0x7ffff2e96e30) at ../qapi/qmp-dispatch.c:124
> 124 QmpDispatchBH *data = opaque;
> #0 do_qmp_dispatch_bh (opaque=0x7ffff2e96e30) at ../qapi/qmp-dispatch.c:124
> #1 0x000055555604f50a in aio_bh_call (bh=0x7fffe4005430) at ../util/async.c:155
> #2 0x000055555604f615 in aio_bh_poll (ctx=0x555556b3e730) at ../util/async.c:184
> #3 0x00005555560337b8 in aio_poll (ctx=0x555556b3e730, blocking=true) at ../util/aio-posix.c:721
> #4 0x0000555555e8cf1c in bdrv_poll_co (s=0x7ffff1a45eb0) at /home/febner/repos/qemu/block/block-gen.h:43
> #5 0x0000555555e8fc3a in blk_pwrite (blk=0x555556daf840, offset=159232, bytes=512, buf=0x7ffee3226e00, flags=0) at block/block-gen.c:1650
> #6 0x0000555555908078 in pflash_update (pfl=0x555556d92300, offset=159232, size=1) at ../hw/block/pflash_cfi01.c:394
> #7 0x0000555555908749 in pflash_write (pfl=0x555556d92300, offset=159706, value=127, width=1, be=0) at ../hw/block/pflash_cfi01.c:522
> #8 0x0000555555908cda in pflash_mem_write_with_attrs (opaque=0x555556d92300, addr=159706, value=127, len=1, attrs=...) at ../hw/block/pflash_cfi01.c:681
> #9 0x0000555555d8936a in memory_region_write_with_attrs_accessor (mr=0x555556d926c0, addr=159706, value=0x7ffff1a460c8, size=1, shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514
> #10 0x0000555555d894a9 in access_with_adjusted_size (addr=159706, value=0x7ffff1a460c8, size=1, access_size_min=1, access_size_max=4, access_fn=0x555555d89270 <memory_region_write_with_attrs_accessor>, mr=0x555556d926c0, attrs=...) at ../softmmu/memory.c:555
> #11 0x0000555555d8c5de in memory_region_dispatch_write (mr=0x555556d926c0, addr=159706, data=127, op=MO_8, attrs=...) at ../softmmu/memory.c:1522
> #12 0x0000555555d996f4 in flatview_write_continue (fv=0x7fffe843cc60, addr=4290932698, attrs=..., ptr=0x7ffff7fc5028, len=1, addr1=159706, l=1, mr=0x555556d926c0) at ../softmmu/physmem.c:2641
> #13 0x0000555555d99857 in flatview_write (fv=0x7fffe843cc60, addr=4290932698, attrs=..., buf=0x7ffff7fc5028, len=1) at ../softmmu/physmem.c:2683
> #14 0x0000555555d99c07 in address_space_write (as=0x555556a01b20 <address_space_memory>, addr=4290932698, attrs=..., buf=0x7ffff7fc5028, len=1) at ../softmmu/physmem.c:2779
> #15 0x0000555555d99c74 in address_space_rw (as=0x555556a01b20 <address_space_memory>, addr=4290932698, attrs=..., buf=0x7ffff7fc5028, len=1, is_write=true) at ../softmmu/physmem.c:2789
> #16 0x0000555555e2da88 in kvm_cpu_exec (cpu=0x555556ea10d0) at ../accel/kvm/kvm-all.c:2989
> #17 0x0000555555e3079a in kvm_vcpu_thread_fn (arg=0x555556ea10d0) at ../accel/kvm/kvm-accel-ops.c:51
> #18 0x000055555603825f in qemu_thread_start (args=0x555556b3c3b0) at ../util/qemu-thread-posix.c:541
> #19 0x00007ffff7125ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
> #20 0x00007ffff62c4a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-19 14:09 QMP (without OOB) function running in thread different from the main thread as part of aio_poll Fiona Ebner @ 2023-04-19 16:59 ` Paolo Bonzini 2023-04-20 6:11 ` Markus Armbruster 1 sibling, 0 replies; 21+ messages in thread From: Paolo Bonzini @ 2023-04-19 16:59 UTC (permalink / raw) To: Fiona Ebner Cc: QEMU Developers, open list:Network Block Dev..., Michael Roth, Markus Armbruster, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht [-- Attachment #1: Type: text/plain, Size: 580 bytes --] Il mer 19 apr 2023, 16:11 Fiona Ebner <f.ebner@proxmox.com> ha scritto: Hi, while debugging a completely different issue, I was surprised to see do_qmp_dispatch_bh being run in a vCPU thread. I was under the impression that QMP functions are supposed to be executed in the main thread. Is that wrong While monitor commands in general start only in the main thread, bottom halves can also run (via aio_poll) in the vCPU threads, always under the BQL. If that's not valid, perhaps you can schedule the monitor's bottom half in iohandler_get_aio_context() instead? Paolo > > [-- Attachment #2: Type: text/html, Size: 1261 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-19 14:09 QMP (without OOB) function running in thread different from the main thread as part of aio_poll Fiona Ebner 2023-04-19 16:59 ` Paolo Bonzini @ 2023-04-20 6:11 ` Markus Armbruster 2023-04-20 6:55 ` Paolo Bonzini 1 sibling, 1 reply; 21+ messages in thread From: Markus Armbruster @ 2023-04-20 6:11 UTC (permalink / raw) To: Fiona Ebner Cc: QEMU Developers, qemu-block, michael.roth, Fam Zheng, Stefan Hajnoczi, Paolo Bonzini, Thomas Lamprecht, Kevin Wolf, Peter Xu Fiona Ebner <f.ebner@proxmox.com> writes: > Hi, > while debugging a completely different issue, I was surprised to see > do_qmp_dispatch_bh being run in a vCPU thread. I was under the > impression that QMP functions are supposed to be executed in the main > thread. Is that wrong? > > I managed to reproduced the scenario with a build of upstream QEMU > v8.0.0-rc4 once more (again with GDB open), but it's not a simple > reproducer and racy. The backtrace is below[0]. > > My attempt at explaining the situation is: > 1. In qapi/qmp-dispatch.c, the main thread schedules do_qmp_dispatch_bh, > because the coroutine context doesn't match. > 2. The host OS switches to the vCPU thread. > 3. Something in the vCPU thread triggers aio_poll with the main thread's > AioContext (in my case, a write to a pflash drive). > 4. do_qmp_dispatch_bh is executed in the vCPU thread. > > Could this be an artifact of running with a debugger? > > I CC'ed the maintainers of util/aio-posix.c and qapi/qmp-dispatch.c > hoping that is not too far off. Fiona, thanks for reporting this! The monitor core has become awfully complex. Instead of trying to explain everything at once, let me recount its history. If you're familiar with the monitor, you might want to skip ahead to "Paolo wrote". In the beginning, there was just one monitor, and it ran entirely in the main loop (callback when input is available). To keep the main loop going, monitor commands better complete quickly. Then we got multiple monitors. Same story, just multiple input streams, each with a callback. We also got additional threads. When I say "main loop", I mean the main thread's main loop. "Must complete quickly" means no blocking I/O and such. Writing code that way is somewhere between hard and impractical. Much code called by monitor commands wasn't written that way. "Doctor, doctor, running code in the main loop hurts". Sadly, the doctor's recommended remedy "don't do that then" is really hard to apply: a lot of code has been written assuming "running in the main loop, with the big QEMU lock held". The first small step towards it was taken to enable the "out-of-band" feature. We moved the QMP monitor proper out of the main loop into a monitor I/O thread. The monitor commands get funneled to the main loop. Instead of the main loop calling the monitor when a file descriptor has input, it now calls the command dispatcher when a funnel queue has a command. This is implemented with a bottom half (merge commit 4bdc24fa018, March 2018). Why bother? Because now we can execute special "out-of-band" commands right away, in the I/O thread, regardless of the main loop's state. Peter Xu wanted this badly enough for postcopy recovery to code it up. It was hard. It's not generally useful, as the restriction on what OOB commands can do are severe. The next step was the coroutine feature. Quite a few of the problematic monitor commands are actually running coroutine-capable code: when running in coroutine context, the code yields instead of blocking. Running such commands in monitor context improves things from "blocks the main loop" to "blocks all monitor commands". Sadly, code exists that falls apart in coroutine context. So we had to make running in coroutine context opt-in. Right now only two commands opt in: block_resize and screendump. Hopefully, we can get to the point where most or all do. Until all do, the dispatcher needs to run some commands coroutine context, and others outside coroutine context. How? The QMP command dispatcher runs in a coroutine in the main loop (HMP works differently, but let's ignore it here). If the command can be executed in coroutine context, the dispatcher calls its handler as before. Right now, we take this path just for block_resize and screendump. Else, we create a bottom half that calls the handler, and schedule it to run in the main loop. Right now, we take this path for all the other commands. This if / else is in qmp_dispatch(). The else is the only way to get do_qmp_dispatch_bh() to run. Merge commit b7092cda1b3, October 2020. Paolo wrote: While monitor commands in general start only in the main thread, bottom halves can also run (via aio_poll) in the vCPU threads, always under the BQL. Today I learned... So, splicing in a bottom half unmoored monitor commands from the main loop. We weren't aware of that, as our commit messages show. I guess the commands themselves don't care; all they need is the BQL. However, did we unwittingly change what can get blocked? Before, monitor commands could block only the main thread. Now they can also block vCPU threads. Impact? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-20 6:11 ` Markus Armbruster @ 2023-04-20 6:55 ` Paolo Bonzini 2023-04-26 14:31 ` Fiona Ebner 0 siblings, 1 reply; 21+ messages in thread From: Paolo Bonzini @ 2023-04-20 6:55 UTC (permalink / raw) To: Markus Armbruster Cc: Fiona Ebner, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Kevin Wolf, Peter Xu [-- Attachment #1: Type: text/plain, Size: 1217 bytes --] Il gio 20 apr 2023, 08:11 Markus Armbruster <armbru@redhat.com> ha scritto: > So, splicing in a bottom half unmoored monitor commands from the main > loop. We weren't aware of that, as our commit messages show. > > I guess the commands themselves don't care; all they need is the BQL. > > However, did we unwittingly change what can get blocked? Before, > monitor commands could block only the main thread. Now they can also > block vCPU threads. Impact? > Monitor commands could always block vCPU threads through the BQL(*). However, aio_poll() only runs in the vCPU threads in very special cases; typically associated to resetting a device which causes a blk_drain() on the device's BlockBackend. So it is not a performance issue. However, liberal reuse of the main block layer AioContext could indeed be a *correctness* issue. I need to re-read Fiona's report instead of stopping at the first three lines because it's the evening. :) Paolo (*) Some of you may remember that when migration ran under the BQL it was also a very effective way to ensure that vCPU threads were slowed down, so it helped migration converge. Once that hurdle was removed, we had to add manual sleeping in the vCPU threads! > [-- Attachment #2: Type: text/html, Size: 1946 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-20 6:55 ` Paolo Bonzini @ 2023-04-26 14:31 ` Fiona Ebner 2023-04-27 11:03 ` Kevin Wolf 0 siblings, 1 reply; 21+ messages in thread From: Fiona Ebner @ 2023-04-26 14:31 UTC (permalink / raw) To: Paolo Bonzini, Markus Armbruster Cc: QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Kevin Wolf, Peter Xu Am 20.04.23 um 08:55 schrieb Paolo Bonzini: > > > Il gio 20 apr 2023, 08:11 Markus Armbruster <armbru@redhat.com > <mailto:armbru@redhat.com>> ha scritto: > > So, splicing in a bottom half unmoored monitor commands from the main > loop. We weren't aware of that, as our commit messages show. > > I guess the commands themselves don't care; all they need is the BQL. > > However, did we unwittingly change what can get blocked? Before, > monitor commands could block only the main thread. Now they can also > block vCPU threads. Impact? > > > Monitor commands could always block vCPU threads through the BQL(*). > However, aio_poll() only runs in the vCPU threads in very special cases; > typically associated to resetting a device which causes a blk_drain() on > the device's BlockBackend. So it is not a performance issue. > AFAIU, all generated coroutine wrappers use aio_poll. In my backtrace aio_poll happens via blk_pwrite for a pflash device. So a bit more often than "very special cases" ;) > However, liberal reuse of the main block layer AioContext could indeed > be a *correctness* issue. I need to re-read Fiona's report instead of > stopping at the first three lines because it's the evening. :) For me, being called in a vCPU thread caused problems with a custom QMP function patched in by Proxmox. The function uses a newly opened BlockBackend and calls qemu_mutex_unlock_iothread() after which qemu_get_current_aio_context() returns 0x0 (when running in the main thread, it still returns the main thread's AioContext). It then calls blk_pwritev which is also a generated coroutine wrapper and the assert(qemu_get_current_aio_context() == qemu_get_aio_context()); in the else branch of the AIO_WAIT_WHILE_INTERNAL macro fails. Sounds like there's room for improvement in our code :/ I'm not aware of something similar in upstream QEMU. Thanks to Markus for the detailed history lesson! Best Regards, Fiona ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-26 14:31 ` Fiona Ebner @ 2023-04-27 11:03 ` Kevin Wolf 2023-04-27 12:27 ` Fiona Ebner 0 siblings, 1 reply; 21+ messages in thread From: Kevin Wolf @ 2023-04-27 11:03 UTC (permalink / raw) To: Fiona Ebner Cc: Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: > Am 20.04.23 um 08:55 schrieb Paolo Bonzini: > > > > > > Il gio 20 apr 2023, 08:11 Markus Armbruster <armbru@redhat.com > > <mailto:armbru@redhat.com>> ha scritto: > > > > So, splicing in a bottom half unmoored monitor commands from the main > > loop. We weren't aware of that, as our commit messages show. > > > > I guess the commands themselves don't care; all they need is the BQL. > > > > However, did we unwittingly change what can get blocked? Before, > > monitor commands could block only the main thread. Now they can also > > block vCPU threads. Impact? > > > > > > Monitor commands could always block vCPU threads through the BQL(*). > > However, aio_poll() only runs in the vCPU threads in very special cases; > > typically associated to resetting a device which causes a blk_drain() on > > the device's BlockBackend. So it is not a performance issue. > > > > AFAIU, all generated coroutine wrappers use aio_poll. In my backtrace > aio_poll happens via blk_pwrite for a pflash device. So a bit more > often than "very special cases" ;) Yes, it's a common thing for devices that start requests from the vcpu thread when handling I/O (as opposed to devices that use an eventfd or similar mechanisms). > > However, liberal reuse of the main block layer AioContext could indeed > > be a *correctness* issue. I need to re-read Fiona's report instead of > > stopping at the first three lines because it's the evening. :) > > For me, being called in a vCPU thread caused problems with a custom QMP > function patched in by Proxmox. The function uses a newly opened > BlockBackend and calls qemu_mutex_unlock_iothread() after which > qemu_get_current_aio_context() returns 0x0 (when running in the main > thread, it still returns the main thread's AioContext). It then calls > blk_pwritev which is also a generated coroutine wrapper and the > assert(qemu_get_current_aio_context() == qemu_get_aio_context()); > in the else branch of the AIO_WAIT_WHILE_INTERNAL macro fails. > > Sounds like there's room for improvement in our code :/ I'm not aware > of something similar in upstream QEMU. Yes, even if it didn't crash immediately, calling blk_*() without holding a lock is invalid. In many cases, this is the BQL. If you don't hold it while calling the function from a vcpu thread, you could run into races with the main thread, which would probably be very painful to debug. Kevin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-27 11:03 ` Kevin Wolf @ 2023-04-27 12:27 ` Fiona Ebner 2023-04-27 14:36 ` Juan Quintela 0 siblings, 1 reply; 21+ messages in thread From: Fiona Ebner @ 2023-04-27 12:27 UTC (permalink / raw) To: Kevin Wolf Cc: Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu, quintela@redhat.com Am 27.04.23 um 13:03 schrieb Kevin Wolf: > Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: >> Am 20.04.23 um 08:55 schrieb Paolo Bonzini: >>> >>> >>> Il gio 20 apr 2023, 08:11 Markus Armbruster <armbru@redhat.com >>> <mailto:armbru@redhat.com>> ha scritto: >>> >>> So, splicing in a bottom half unmoored monitor commands from the main >>> loop. We weren't aware of that, as our commit messages show. >>> >>> I guess the commands themselves don't care; all they need is the BQL. >>> >>> However, did we unwittingly change what can get blocked? Before, >>> monitor commands could block only the main thread. Now they can also >>> block vCPU threads. Impact? >>> >>> >>> Monitor commands could always block vCPU threads through the BQL(*). >>> However, aio_poll() only runs in the vCPU threads in very special cases; >>> typically associated to resetting a device which causes a blk_drain() on >>> the device's BlockBackend. So it is not a performance issue. >>> >> >> AFAIU, all generated coroutine wrappers use aio_poll. In my backtrace >> aio_poll happens via blk_pwrite for a pflash device. So a bit more >> often than "very special cases" ;) > > Yes, it's a common thing for devices that start requests from the vcpu > thread when handling I/O (as opposed to devices that use an eventfd or > similar mechanisms). > >>> However, liberal reuse of the main block layer AioContext could indeed >>> be a *correctness* issue. I need to re-read Fiona's report instead of >>> stopping at the first three lines because it's the evening. :) >> >> For me, being called in a vCPU thread caused problems with a custom QMP >> function patched in by Proxmox. The function uses a newly opened >> BlockBackend and calls qemu_mutex_unlock_iothread() after which >> qemu_get_current_aio_context() returns 0x0 (when running in the main >> thread, it still returns the main thread's AioContext). It then calls >> blk_pwritev which is also a generated coroutine wrapper and the >> assert(qemu_get_current_aio_context() == qemu_get_aio_context()); >> in the else branch of the AIO_WAIT_WHILE_INTERNAL macro fails. >> >> Sounds like there's room for improvement in our code :/ I'm not aware >> of something similar in upstream QEMU. > > Yes, even if it didn't crash immediately, calling blk_*() without > holding a lock is invalid. In many cases, this is the BQL. If you don't > hold it while calling the function from a vcpu thread, you could run > into races with the main thread, which would probably be very painful to > debug. > > Kevin > Our function is a custom variant of saving a snapshot and uses qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread() is there. I looked for inspiration for how upstream does things and it turns out that upstream QEMU v8.0.0 has essentially the same issue with snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead of the main thread, the situation is the same: after qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0]. So all bottom halves scheduled for the main thread's AioContext can potentially get to run in a vCPU thread and need to be very careful with things like qemu_mutex_unlock_iothread. Is qemu_get_current_aio_context() returning 0x0 expected? I haven't looked into why it happens yet. Does there need to be a way to drop the BQL without also giving up the main thread's AioContext or would it be enough to re-acquire the context? CC-ing Juan as the migration maintainer. Best Regards, Fiona [0]: > Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted. > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007f9027b3e537 in __GI_abort () at abort.c:79 > #2 0x00007f9027b3e40f in __assert_fail_base (fmt=0x7f9027cb66a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == qemu_get_aio_context()", file=0x558ed44fce80 "/home/febner/repos/qemu/block/block-gen.h", line=43, function=<optimized out>) at assert.c:92 > #3 0x00007f9027b4d662 in __GI___assert_fail (assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == qemu_get_aio_context()", file=0x558ed44fce80 "/home/febner/repos/qemu/block/block-gen.h", line=43, function=0x558ed44fcf80 <__PRETTY_FUNCTION__.14> "bdrv_poll_co") at assert.c:101 > #4 0x0000558ed412df5f in bdrv_poll_co (s=0x7f8ffcff37a0) at /home/febner/repos/qemu/block/block-gen.h:43 > #5 0x0000558ed412f4cd in bdrv_writev_vmstate (bs=0x558ed60536a0, qiov=0x7f8ffcff3840, pos=0) at block/block-gen.c:809 > #6 0x0000558ed3df36d0 in qio_channel_block_writev (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, flags=0, errp=0x7f8ffcff39c0) at ../migration/channel-block.c:89 > #7 0x0000558ed40feedb in qio_channel_writev_full (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:108 > #8 0x0000558ed40ff3c3 in qio_channel_writev_full_all (ioc=0x7f8ff40ac060, iov=0x7f8ff4648040, niov=1, fds=0x0, nfds=0, flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:263 > #9 0x0000558ed40ff2e4 in qio_channel_writev_all (ioc=0x7f8ff40ac060, iov=0x7f8ff4648040, niov=1, errp=0x7f8ffcff39c0) at ../io/channel.c:242 > #10 0x0000558ed3dee4dc in qemu_fflush (f=0x7f8ff4640000) at ../migration/qemu-file.c:302 > #11 0x0000558ed4050f91 in ram_save_setup (f=0x7f8ff4640000, opaque=0x558ed4ca34c0 <ram_state>) at ../migration/ram.c:3302 > #12 0x0000558ed3e141c8 in qemu_savevm_state_setup (f=0x7f8ff4640000) at ../migration/savevm.c:1266 > #13 0x0000558ed3e14eed in qemu_savevm_state (f=0x7f8ff4640000, errp=0x558ed68c5238) at ../migration/savevm.c:1626 > #14 0x0000558ed3e1755e in save_snapshot (name=0x558ed72af790 "snap0", overwrite=false, vmstate=0x558ed6708ce0 "scsi0", has_devices=true, devices=0x558ed66d6a60, errp=0x558ed68c5238) at ../migration/savevm.c:2954 > #15 0x0000558ed3e17fb1 in snapshot_save_job_bh (opaque=0x558ed68c5170) at ../migration/savevm.c:3253 > #16 0x0000558ed42f050a in aio_bh_call (bh=0x558ed671ae00) at ../util/async.c:155 > #17 0x0000558ed42f0615 in aio_bh_poll (ctx=0x558ed5c62910) at ../util/async.c:184 > #18 0x0000558ed42d47b8 in aio_poll (ctx=0x558ed5c62910, blocking=true) at ../util/aio-posix.c:721 > #19 0x0000558ed412df1c in bdrv_poll_co (s=0x7f8ffcff3eb0) at /home/febner/repos/qemu/block/block-gen.h:43 > #20 0x0000558ed4130c3a in blk_pwrite (blk=0x558ed5ed4f60, offset=230912, bytes=512, buf=0x7f8ffc438600, flags=0) at block/block-gen.c:1650 > #21 0x0000558ed3ba9078 in pflash_update (pfl=0x558ed5eb7b30, offset=230912, size=1) at ../hw/block/pflash_cfi01.c:394 > #22 0x0000558ed3ba9749 in pflash_write (pfl=0x558ed5eb7b30, offset=231232, value=0, width=1, be=0) at ../hw/block/pflash_cfi01.c:522 > #23 0x0000558ed3ba9cda in pflash_mem_write_with_attrs (opaque=0x558ed5eb7b30, addr=231232, value=0, len=1, attrs=...) at ../hw/block/pflash_cfi01.c:681 > #24 0x0000558ed402a36a in memory_region_write_with_attrs_accessor (mr=0x558ed5eb7ef0, addr=231232, value=0x7f8ffcff40c8, size=1, shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514 > #25 0x0000558ed402a4a9 in access_with_adjusted_size (addr=231232, value=0x7f8ffcff40c8, size=1, access_size_min=1, access_size_max=4, access_fn=0x558ed402a270 <memory_region_write_with_attrs_accessor>, mr=0x558ed5eb7ef0, attrs=...) at ../softmmu/memory.c:555 > #26 0x0000558ed402d5de in memory_region_dispatch_write (mr=0x558ed5eb7ef0, addr=231232, data=0, op=MO_8, attrs=...) at ../softmmu/memory.c:1522 > #27 0x0000558ed403a6f4 in flatview_write_continue (fv=0x558ed66d62c0, addr=4291004224, attrs=..., ptr=0x7f9029957028, len=1, addr1=231232, l=1, mr=0x558ed5eb7ef0) at ../softmmu/physmem.c:2641 > #28 0x0000558ed403a857 in flatview_write (fv=0x558ed66d62c0, addr=4291004224, attrs=..., buf=0x7f9029957028, len=1) at ../softmmu/physmem.c:2683 > #29 0x0000558ed403ac07 in address_space_write (as=0x558ed4ca2b20 <address_space_memory>, addr=4291004224, attrs=..., buf=0x7f9029957028, len=1) at ../softmmu/physmem.c:2779 > #30 0x0000558ed403ac74 in address_space_rw (as=0x558ed4ca2b20 <address_space_memory>, addr=4291004224, attrs=..., buf=0x7f9029957028, len=1, is_write=true) at ../softmmu/physmem.c:2789 > #31 0x0000558ed40cea88 in kvm_cpu_exec (cpu=0x558ed622a910) at ../accel/kvm/kvm-all.c:2989 > #32 0x0000558ed40d179a in kvm_vcpu_thread_fn (arg=0x558ed622a910) at ../accel/kvm/kvm-accel-ops.c:51 > #33 0x0000558ed42d925f in qemu_thread_start (args=0x558ed5c68c80) at ../util/qemu-thread-posix.c:541 > #34 0x00007f9028ab7ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 > #35 0x00007f9027c18a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-27 12:27 ` Fiona Ebner @ 2023-04-27 14:36 ` Juan Quintela 2023-04-27 14:56 ` Peter Xu 2023-04-28 7:23 ` Fiona Ebner 0 siblings, 2 replies; 21+ messages in thread From: Juan Quintela @ 2023-04-27 14:36 UTC (permalink / raw) To: Fiona Ebner Cc: Kevin Wolf, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Fiona Ebner <f.ebner@proxmox.com> wrote: > Am 27.04.23 um 13:03 schrieb Kevin Wolf: >> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: >>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini: Hi > Our function is a custom variant of saving a snapshot and uses > qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread() > is there. I looked for inspiration for how upstream does things and it > turns out that upstream QEMU v8.0.0 has essentially the same issue with > snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead > of the main thread, the situation is the same: after > qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return > 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails > (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0]. > > > So all bottom halves scheduled for the main thread's AioContext can > potentially get to run in a vCPU thread and need to be very careful with > things like qemu_mutex_unlock_iothread. > > Is qemu_get_current_aio_context() returning 0x0 expected? I haven't > looked into why it happens yet. Does there need to be a way to drop the > BQL without also giving up the main thread's AioContext or would it be > enough to re-acquire the context? > > CC-ing Juan as the migration maintainer. This is the world backwards. The tradition is that migration people blame block layer people for breaking things and for help, not the other way around O:-) > Best Regards, > Fiona > > [0]: >> Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted. >> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >> 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> (gdb) bt >> #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >> #1 0x00007f9027b3e537 in __GI_abort () at abort.c:79 >> #2 0x00007f9027b3e40f in __assert_fail_base (fmt=0x7f9027cb66a8 >> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", >> assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == >> qemu_get_aio_context()", file=0x558ed44fce80 >> "/home/febner/repos/qemu/block/block-gen.h", line=43, >> function=<optimized out>) at assert.c:92 >> #3 0x00007f9027b4d662 in __GI___assert_fail >> (assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == >> qemu_get_aio_context()", file=0x558ed44fce80 >> "/home/febner/repos/qemu/block/block-gen.h", line=43, >> function=0x558ed44fcf80 <__PRETTY_FUNCTION__.14> "bdrv_poll_co") at >> assert.c:101 >> #4 0x0000558ed412df5f in bdrv_poll_co (s=0x7f8ffcff37a0) at /home/febner/repos/qemu/block/block-gen.h:43 >> #5 0x0000558ed412f4cd in bdrv_writev_vmstate (bs=0x558ed60536a0, qiov=0x7f8ffcff3840, pos=0) at block/block-gen.c:809 >> #6 0x0000558ed3df36d0 in qio_channel_block_writev >> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, >> flags=0, errp=0x7f8ffcff39c0) at ../migration/channel-block.c:89 >> #7 0x0000558ed40feedb in qio_channel_writev_full >> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, >> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:108 >> #8 0x0000558ed40ff3c3 in qio_channel_writev_full_all >> (ioc=0x7f8ff40ac060, iov=0x7f8ff4648040, niov=1, fds=0x0, nfds=0, >> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:263 >> #9 0x0000558ed40ff2e4 in qio_channel_writev_all (ioc=0x7f8ff40ac060, >> iov=0x7f8ff4648040, niov=1, errp=0x7f8ffcff39c0) at >> ../io/channel.c:242 >> #10 0x0000558ed3dee4dc in qemu_fflush (f=0x7f8ff4640000) at ../migration/qemu-file.c:302 >> #11 0x0000558ed4050f91 in ram_save_setup (f=0x7f8ff4640000, opaque=0x558ed4ca34c0 <ram_state>) at ../migration/ram.c:3302 >> #12 0x0000558ed3e141c8 in qemu_savevm_state_setup (f=0x7f8ff4640000) at ../migration/savevm.c:1266 >> #13 0x0000558ed3e14eed in qemu_savevm_state (f=0x7f8ff4640000, errp=0x558ed68c5238) at ../migration/savevm.c:1626 >> #14 0x0000558ed3e1755e in save_snapshot (name=0x558ed72af790 >> "snap0", overwrite=false, vmstate=0x558ed6708ce0 "scsi0", >> has_devices=true, devices=0x558ed66d6a60, errp=0x558ed68c5238) at >> ../migration/savevm.c:2954 >> #15 0x0000558ed3e17fb1 in snapshot_save_job_bh (opaque=0x558ed68c5170) at ../migration/savevm.c:3253 >> #16 0x0000558ed42f050a in aio_bh_call (bh=0x558ed671ae00) at ../util/async.c:155 >> #17 0x0000558ed42f0615 in aio_bh_poll (ctx=0x558ed5c62910) at ../util/async.c:184 >> #18 0x0000558ed42d47b8 in aio_poll (ctx=0x558ed5c62910, blocking=true) at ../util/aio-posix.c:721 >> #19 0x0000558ed412df1c in bdrv_poll_co (s=0x7f8ffcff3eb0) at /home/febner/repos/qemu/block/block-gen.h:43 >> #20 0x0000558ed4130c3a in blk_pwrite (blk=0x558ed5ed4f60, >> offset=230912, bytes=512, buf=0x7f8ffc438600, flags=0) at >> block/block-gen.c:1650 >> #21 0x0000558ed3ba9078 in pflash_update (pfl=0x558ed5eb7b30, offset=230912, size=1) at ../hw/block/pflash_cfi01.c:394 >> #22 0x0000558ed3ba9749 in pflash_write (pfl=0x558ed5eb7b30, >> offset=231232, value=0, width=1, be=0) at >> ../hw/block/pflash_cfi01.c:522 >> #23 0x0000558ed3ba9cda in pflash_mem_write_with_attrs >> (opaque=0x558ed5eb7b30, addr=231232, value=0, len=1, attrs=...) at >> ../hw/block/pflash_cfi01.c:681 >> #24 0x0000558ed402a36a in memory_region_write_with_attrs_accessor >> (mr=0x558ed5eb7ef0, addr=231232, value=0x7f8ffcff40c8, size=1, >> shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514 >> #25 0x0000558ed402a4a9 in access_with_adjusted_size (addr=231232, >> value=0x7f8ffcff40c8, size=1, access_size_min=1, access_size_max=4, >> access_fn=0x558ed402a270 <memory_region_write_with_attrs_accessor>, >> mr=0x558ed5eb7ef0, attrs=...) at ../softmmu/memory.c:555 >> #26 0x0000558ed402d5de in memory_region_dispatch_write >> (mr=0x558ed5eb7ef0, addr=231232, data=0, op=MO_8, attrs=...) at >> ../softmmu/memory.c:1522 >> #27 0x0000558ed403a6f4 in flatview_write_continue >> (fv=0x558ed66d62c0, addr=4291004224, attrs=..., ptr=0x7f9029957028, >> len=1, addr1=231232, l=1, mr=0x558ed5eb7ef0) at >> ../softmmu/physmem.c:2641 >> #28 0x0000558ed403a857 in flatview_write (fv=0x558ed66d62c0, >> addr=4291004224, attrs=..., buf=0x7f9029957028, len=1) at >> ../softmmu/physmem.c:2683 >> #29 0x0000558ed403ac07 in address_space_write (as=0x558ed4ca2b20 >> <address_space_memory>, addr=4291004224, attrs=..., >> buf=0x7f9029957028, len=1) at ../softmmu/physmem.c:2779 >> #30 0x0000558ed403ac74 in address_space_rw (as=0x558ed4ca2b20 >> <address_space_memory>, addr=4291004224, attrs=..., >> buf=0x7f9029957028, len=1, is_write=true) at >> ../softmmu/physmem.c:2789 >> #31 0x0000558ed40cea88 in kvm_cpu_exec (cpu=0x558ed622a910) at ../accel/kvm/kvm-all.c:2989 >> #32 0x0000558ed40d179a in kvm_vcpu_thread_fn (arg=0x558ed622a910) at ../accel/kvm/kvm-accel-ops.c:51 >> #33 0x0000558ed42d925f in qemu_thread_start (args=0x558ed5c68c80) at ../util/qemu-thread-posix.c:541 >> #34 0x00007f9028ab7ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 >> #35 0x00007f9027c18a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 To see that I am understading this right: - you create a thread - that calls a memory_region operation - that calls a device write function - that calls the block layer - that creates a snapshot - that calls the migration code - that calls the block layer again Without further investigation, I have no clue what is going on here, sorry. Later, Juan. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-27 14:36 ` Juan Quintela @ 2023-04-27 14:56 ` Peter Xu 2023-04-28 7:53 ` Fiona Ebner 2023-04-28 7:23 ` Fiona Ebner 1 sibling, 1 reply; 21+ messages in thread From: Peter Xu @ 2023-04-27 14:56 UTC (permalink / raw) To: Juan Quintela Cc: Fiona Ebner, Kevin Wolf, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht On Thu, Apr 27, 2023 at 04:36:14PM +0200, Juan Quintela wrote: > Fiona Ebner <f.ebner@proxmox.com> wrote: > > Am 27.04.23 um 13:03 schrieb Kevin Wolf: > >> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: > >>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini: > > Hi > > > Our function is a custom variant of saving a snapshot and uses > > qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread() > > is there. I looked for inspiration for how upstream does things and it > > turns out that upstream QEMU v8.0.0 has essentially the same issue with > > snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead > > of the main thread, the situation is the same: after > > qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return > > 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails > > (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0]. > > > > > > So all bottom halves scheduled for the main thread's AioContext can > > potentially get to run in a vCPU thread and need to be very careful with > > things like qemu_mutex_unlock_iothread. > > > > Is qemu_get_current_aio_context() returning 0x0 expected? I haven't > > looked into why it happens yet. Does there need to be a way to drop the > > BQL without also giving up the main thread's AioContext or would it be > > enough to re-acquire the context? > > > > CC-ing Juan as the migration maintainer. > > This is the world backwards. > The tradition is that migration people blame block layer people for > breaking things and for help, not the other way around O:-) > > > Best Regards, > > Fiona > > > > [0]: > >> Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted. > >> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > >> 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > >> (gdb) bt > >> #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > >> #1 0x00007f9027b3e537 in __GI_abort () at abort.c:79 > >> #2 0x00007f9027b3e40f in __assert_fail_base (fmt=0x7f9027cb66a8 > >> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > >> assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == > >> qemu_get_aio_context()", file=0x558ed44fce80 > >> "/home/febner/repos/qemu/block/block-gen.h", line=43, > >> function=<optimized out>) at assert.c:92 > >> #3 0x00007f9027b4d662 in __GI___assert_fail > >> (assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == > >> qemu_get_aio_context()", file=0x558ed44fce80 > >> "/home/febner/repos/qemu/block/block-gen.h", line=43, > >> function=0x558ed44fcf80 <__PRETTY_FUNCTION__.14> "bdrv_poll_co") at > >> assert.c:101 > >> #4 0x0000558ed412df5f in bdrv_poll_co (s=0x7f8ffcff37a0) at /home/febner/repos/qemu/block/block-gen.h:43 > >> #5 0x0000558ed412f4cd in bdrv_writev_vmstate (bs=0x558ed60536a0, qiov=0x7f8ffcff3840, pos=0) at block/block-gen.c:809 > >> #6 0x0000558ed3df36d0 in qio_channel_block_writev > >> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, > >> flags=0, errp=0x7f8ffcff39c0) at ../migration/channel-block.c:89 > >> #7 0x0000558ed40feedb in qio_channel_writev_full > >> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, > >> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:108 > >> #8 0x0000558ed40ff3c3 in qio_channel_writev_full_all > >> (ioc=0x7f8ff40ac060, iov=0x7f8ff4648040, niov=1, fds=0x0, nfds=0, > >> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:263 > >> #9 0x0000558ed40ff2e4 in qio_channel_writev_all (ioc=0x7f8ff40ac060, > >> iov=0x7f8ff4648040, niov=1, errp=0x7f8ffcff39c0) at > >> ../io/channel.c:242 > >> #10 0x0000558ed3dee4dc in qemu_fflush (f=0x7f8ff4640000) at ../migration/qemu-file.c:302 > >> #11 0x0000558ed4050f91 in ram_save_setup (f=0x7f8ff4640000, opaque=0x558ed4ca34c0 <ram_state>) at ../migration/ram.c:3302 > >> #12 0x0000558ed3e141c8 in qemu_savevm_state_setup (f=0x7f8ff4640000) at ../migration/savevm.c:1266 > >> #13 0x0000558ed3e14eed in qemu_savevm_state (f=0x7f8ff4640000, errp=0x558ed68c5238) at ../migration/savevm.c:1626 > >> #14 0x0000558ed3e1755e in save_snapshot (name=0x558ed72af790 > >> "snap0", overwrite=false, vmstate=0x558ed6708ce0 "scsi0", > >> has_devices=true, devices=0x558ed66d6a60, errp=0x558ed68c5238) at > >> ../migration/savevm.c:2954 > >> #15 0x0000558ed3e17fb1 in snapshot_save_job_bh (opaque=0x558ed68c5170) at ../migration/savevm.c:3253 > >> #16 0x0000558ed42f050a in aio_bh_call (bh=0x558ed671ae00) at ../util/async.c:155 > >> #17 0x0000558ed42f0615 in aio_bh_poll (ctx=0x558ed5c62910) at ../util/async.c:184 > >> #18 0x0000558ed42d47b8 in aio_poll (ctx=0x558ed5c62910, blocking=true) at ../util/aio-posix.c:721 > >> #19 0x0000558ed412df1c in bdrv_poll_co (s=0x7f8ffcff3eb0) at /home/febner/repos/qemu/block/block-gen.h:43 > >> #20 0x0000558ed4130c3a in blk_pwrite (blk=0x558ed5ed4f60, > >> offset=230912, bytes=512, buf=0x7f8ffc438600, flags=0) at > >> block/block-gen.c:1650 > >> #21 0x0000558ed3ba9078 in pflash_update (pfl=0x558ed5eb7b30, offset=230912, size=1) at ../hw/block/pflash_cfi01.c:394 > >> #22 0x0000558ed3ba9749 in pflash_write (pfl=0x558ed5eb7b30, > >> offset=231232, value=0, width=1, be=0) at > >> ../hw/block/pflash_cfi01.c:522 > >> #23 0x0000558ed3ba9cda in pflash_mem_write_with_attrs > >> (opaque=0x558ed5eb7b30, addr=231232, value=0, len=1, attrs=...) at > >> ../hw/block/pflash_cfi01.c:681 > >> #24 0x0000558ed402a36a in memory_region_write_with_attrs_accessor > >> (mr=0x558ed5eb7ef0, addr=231232, value=0x7f8ffcff40c8, size=1, > >> shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514 > >> #25 0x0000558ed402a4a9 in access_with_adjusted_size (addr=231232, > >> value=0x7f8ffcff40c8, size=1, access_size_min=1, access_size_max=4, > >> access_fn=0x558ed402a270 <memory_region_write_with_attrs_accessor>, > >> mr=0x558ed5eb7ef0, attrs=...) at ../softmmu/memory.c:555 > >> #26 0x0000558ed402d5de in memory_region_dispatch_write > >> (mr=0x558ed5eb7ef0, addr=231232, data=0, op=MO_8, attrs=...) at > >> ../softmmu/memory.c:1522 > >> #27 0x0000558ed403a6f4 in flatview_write_continue > >> (fv=0x558ed66d62c0, addr=4291004224, attrs=..., ptr=0x7f9029957028, > >> len=1, addr1=231232, l=1, mr=0x558ed5eb7ef0) at > >> ../softmmu/physmem.c:2641 > >> #28 0x0000558ed403a857 in flatview_write (fv=0x558ed66d62c0, > >> addr=4291004224, attrs=..., buf=0x7f9029957028, len=1) at > >> ../softmmu/physmem.c:2683 > >> #29 0x0000558ed403ac07 in address_space_write (as=0x558ed4ca2b20 > >> <address_space_memory>, addr=4291004224, attrs=..., > >> buf=0x7f9029957028, len=1) at ../softmmu/physmem.c:2779 > >> #30 0x0000558ed403ac74 in address_space_rw (as=0x558ed4ca2b20 > >> <address_space_memory>, addr=4291004224, attrs=..., > >> buf=0x7f9029957028, len=1, is_write=true) at > >> ../softmmu/physmem.c:2789 > >> #31 0x0000558ed40cea88 in kvm_cpu_exec (cpu=0x558ed622a910) at ../accel/kvm/kvm-all.c:2989 > >> #32 0x0000558ed40d179a in kvm_vcpu_thread_fn (arg=0x558ed622a910) at ../accel/kvm/kvm-accel-ops.c:51 > >> #33 0x0000558ed42d925f in qemu_thread_start (args=0x558ed5c68c80) at ../util/qemu-thread-posix.c:541 > >> #34 0x00007f9028ab7ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 > >> #35 0x00007f9027c18a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Totally unfamiliar with block jobs, but... does it mean that snapshot_*_job_bh()s should just always make sure BQL taken? Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-27 14:56 ` Peter Xu @ 2023-04-28 7:53 ` Fiona Ebner 0 siblings, 0 replies; 21+ messages in thread From: Fiona Ebner @ 2023-04-28 7:53 UTC (permalink / raw) To: Peter Xu, Juan Quintela Cc: Kevin Wolf, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht Am 27.04.23 um 16:56 schrieb Peter Xu: > On Thu, Apr 27, 2023 at 04:36:14PM +0200, Juan Quintela wrote: >> Fiona Ebner <f.ebner@proxmox.com> wrote: >>> Am 27.04.23 um 13:03 schrieb Kevin Wolf: >>>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: >>>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini: >> >> Hi >> >>> Our function is a custom variant of saving a snapshot and uses >>> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread() >>> is there. I looked for inspiration for how upstream does things and it >>> turns out that upstream QEMU v8.0.0 has essentially the same issue with >>> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead >>> of the main thread, the situation is the same: after >>> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return >>> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails >>> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0]. >>> >>> >>> So all bottom halves scheduled for the main thread's AioContext can >>> potentially get to run in a vCPU thread and need to be very careful with >>> things like qemu_mutex_unlock_iothread. >>> >>> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't >>> looked into why it happens yet. Does there need to be a way to drop the >>> BQL without also giving up the main thread's AioContext or would it be >>> enough to re-acquire the context? >>> >>> CC-ing Juan as the migration maintainer. >> >> This is the world backwards. >> The tradition is that migration people blame block layer people for >> breaking things and for help, not the other way around O:-) >> >>> Best Regards, >>> Fiona >>> >>> [0]: >>>> Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted. >>>> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >>>> 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >>>> (gdb) bt >>>> #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >>>> #1 0x00007f9027b3e537 in __GI_abort () at abort.c:79 >>>> #2 0x00007f9027b3e40f in __assert_fail_base (fmt=0x7f9027cb66a8 >>>> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", >>>> assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == >>>> qemu_get_aio_context()", file=0x558ed44fce80 >>>> "/home/febner/repos/qemu/block/block-gen.h", line=43, >>>> function=<optimized out>) at assert.c:92 >>>> #3 0x00007f9027b4d662 in __GI___assert_fail >>>> (assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == >>>> qemu_get_aio_context()", file=0x558ed44fce80 >>>> "/home/febner/repos/qemu/block/block-gen.h", line=43, >>>> function=0x558ed44fcf80 <__PRETTY_FUNCTION__.14> "bdrv_poll_co") at >>>> assert.c:101 >>>> #4 0x0000558ed412df5f in bdrv_poll_co (s=0x7f8ffcff37a0) at /home/febner/repos/qemu/block/block-gen.h:43 >>>> #5 0x0000558ed412f4cd in bdrv_writev_vmstate (bs=0x558ed60536a0, qiov=0x7f8ffcff3840, pos=0) at block/block-gen.c:809 >>>> #6 0x0000558ed3df36d0 in qio_channel_block_writev >>>> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, >>>> flags=0, errp=0x7f8ffcff39c0) at ../migration/channel-block.c:89 >>>> #7 0x0000558ed40feedb in qio_channel_writev_full >>>> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, >>>> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:108 >>>> #8 0x0000558ed40ff3c3 in qio_channel_writev_full_all >>>> (ioc=0x7f8ff40ac060, iov=0x7f8ff4648040, niov=1, fds=0x0, nfds=0, >>>> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:263 >>>> #9 0x0000558ed40ff2e4 in qio_channel_writev_all (ioc=0x7f8ff40ac060, >>>> iov=0x7f8ff4648040, niov=1, errp=0x7f8ffcff39c0) at >>>> ../io/channel.c:242 >>>> #10 0x0000558ed3dee4dc in qemu_fflush (f=0x7f8ff4640000) at ../migration/qemu-file.c:302 >>>> #11 0x0000558ed4050f91 in ram_save_setup (f=0x7f8ff4640000, opaque=0x558ed4ca34c0 <ram_state>) at ../migration/ram.c:3302 >>>> #12 0x0000558ed3e141c8 in qemu_savevm_state_setup (f=0x7f8ff4640000) at ../migration/savevm.c:1266 >>>> #13 0x0000558ed3e14eed in qemu_savevm_state (f=0x7f8ff4640000, errp=0x558ed68c5238) at ../migration/savevm.c:1626 >>>> #14 0x0000558ed3e1755e in save_snapshot (name=0x558ed72af790 >>>> "snap0", overwrite=false, vmstate=0x558ed6708ce0 "scsi0", >>>> has_devices=true, devices=0x558ed66d6a60, errp=0x558ed68c5238) at >>>> ../migration/savevm.c:2954 >>>> #15 0x0000558ed3e17fb1 in snapshot_save_job_bh (opaque=0x558ed68c5170) at ../migration/savevm.c:3253 >>>> #16 0x0000558ed42f050a in aio_bh_call (bh=0x558ed671ae00) at ../util/async.c:155 >>>> #17 0x0000558ed42f0615 in aio_bh_poll (ctx=0x558ed5c62910) at ../util/async.c:184 >>>> #18 0x0000558ed42d47b8 in aio_poll (ctx=0x558ed5c62910, blocking=true) at ../util/aio-posix.c:721 >>>> #19 0x0000558ed412df1c in bdrv_poll_co (s=0x7f8ffcff3eb0) at /home/febner/repos/qemu/block/block-gen.h:43 >>>> #20 0x0000558ed4130c3a in blk_pwrite (blk=0x558ed5ed4f60, >>>> offset=230912, bytes=512, buf=0x7f8ffc438600, flags=0) at >>>> block/block-gen.c:1650 >>>> #21 0x0000558ed3ba9078 in pflash_update (pfl=0x558ed5eb7b30, offset=230912, size=1) at ../hw/block/pflash_cfi01.c:394 >>>> #22 0x0000558ed3ba9749 in pflash_write (pfl=0x558ed5eb7b30, >>>> offset=231232, value=0, width=1, be=0) at >>>> ../hw/block/pflash_cfi01.c:522 >>>> #23 0x0000558ed3ba9cda in pflash_mem_write_with_attrs >>>> (opaque=0x558ed5eb7b30, addr=231232, value=0, len=1, attrs=...) at >>>> ../hw/block/pflash_cfi01.c:681 >>>> #24 0x0000558ed402a36a in memory_region_write_with_attrs_accessor >>>> (mr=0x558ed5eb7ef0, addr=231232, value=0x7f8ffcff40c8, size=1, >>>> shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514 >>>> #25 0x0000558ed402a4a9 in access_with_adjusted_size (addr=231232, >>>> value=0x7f8ffcff40c8, size=1, access_size_min=1, access_size_max=4, >>>> access_fn=0x558ed402a270 <memory_region_write_with_attrs_accessor>, >>>> mr=0x558ed5eb7ef0, attrs=...) at ../softmmu/memory.c:555 >>>> #26 0x0000558ed402d5de in memory_region_dispatch_write >>>> (mr=0x558ed5eb7ef0, addr=231232, data=0, op=MO_8, attrs=...) at >>>> ../softmmu/memory.c:1522 >>>> #27 0x0000558ed403a6f4 in flatview_write_continue >>>> (fv=0x558ed66d62c0, addr=4291004224, attrs=..., ptr=0x7f9029957028, >>>> len=1, addr1=231232, l=1, mr=0x558ed5eb7ef0) at >>>> ../softmmu/physmem.c:2641 >>>> #28 0x0000558ed403a857 in flatview_write (fv=0x558ed66d62c0, >>>> addr=4291004224, attrs=..., buf=0x7f9029957028, len=1) at >>>> ../softmmu/physmem.c:2683 >>>> #29 0x0000558ed403ac07 in address_space_write (as=0x558ed4ca2b20 >>>> <address_space_memory>, addr=4291004224, attrs=..., >>>> buf=0x7f9029957028, len=1) at ../softmmu/physmem.c:2779 >>>> #30 0x0000558ed403ac74 in address_space_rw (as=0x558ed4ca2b20 >>>> <address_space_memory>, addr=4291004224, attrs=..., >>>> buf=0x7f9029957028, len=1, is_write=true) at >>>> ../softmmu/physmem.c:2789 >>>> #31 0x0000558ed40cea88 in kvm_cpu_exec (cpu=0x558ed622a910) at ../accel/kvm/kvm-all.c:2989 >>>> #32 0x0000558ed40d179a in kvm_vcpu_thread_fn (arg=0x558ed622a910) at ../accel/kvm/kvm-accel-ops.c:51 >>>> #33 0x0000558ed42d925f in qemu_thread_start (args=0x558ed5c68c80) at ../util/qemu-thread-posix.c:541 >>>> #34 0x00007f9028ab7ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 >>>> #35 0x00007f9027c18a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > Totally unfamiliar with block jobs, but... does it mean that > snapshot_*_job_bh()s should just always make sure BQL taken? > It is taken in the beginning, but qemu_savevm_state drops it. I suppose there is a good reason for it, because when patching out the qemu_mutex_unlock_iothread call, I get ../softmmu/cpus.c:503:qemu_mutex_lock_iothread_impl: assertion failed: (!qemu_mutex_iothread_locked())when using the snapshot-save QMP command. To avoid the assertion failure from the backtrace, I think it would be enough to ensure still having acquired the main thread's AioContext between the qemu_mutex_unlock_iothread and qemu_mutex_lock_iothread calls (which currently is not the case when snapshot_save_job_bh runs in a vCPU thread). But not sure if that is enough for full correctness. Best Regards, Fiona ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-27 14:36 ` Juan Quintela 2023-04-27 14:56 ` Peter Xu @ 2023-04-28 7:23 ` Fiona Ebner 2023-04-28 7:47 ` Juan Quintela 1 sibling, 1 reply; 21+ messages in thread From: Fiona Ebner @ 2023-04-28 7:23 UTC (permalink / raw) To: quintela Cc: Kevin Wolf, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Am 27.04.23 um 16:36 schrieb Juan Quintela: > Fiona Ebner <f.ebner@proxmox.com> wrote: >> Am 27.04.23 um 13:03 schrieb Kevin Wolf: >>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: >>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini: > > Hi > >> Our function is a custom variant of saving a snapshot and uses >> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread() >> is there. I looked for inspiration for how upstream does things and it >> turns out that upstream QEMU v8.0.0 has essentially the same issue with >> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead >> of the main thread, the situation is the same: after >> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return >> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails >> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0]. >> >> >> So all bottom halves scheduled for the main thread's AioContext can >> potentially get to run in a vCPU thread and need to be very careful with >> things like qemu_mutex_unlock_iothread. >> >> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't >> looked into why it happens yet. Does there need to be a way to drop the >> BQL without also giving up the main thread's AioContext or would it be >> enough to re-acquire the context? >> >> CC-ing Juan as the migration maintainer. > > This is the world backwards. > The tradition is that migration people blame block layer people for > breaking things and for help, not the other way around O:-) Sorry, if I didn't provide enough context/explanation. See below for my attempt to re-iterate. I CC'ed you, because the issue happens as part of snapshot-save and in particular the qemu_mutex_unlock_iothread call in qemu_savevm_state is one of the ingredients leading to the problem. > >> Best Regards, >> Fiona >> >> [0]: >>> Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted. >>> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >>> 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >>> (gdb) bt >>> #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >>> #1 0x00007f9027b3e537 in __GI_abort () at abort.c:79 >>> #2 0x00007f9027b3e40f in __assert_fail_base (fmt=0x7f9027cb66a8 >>> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", >>> assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == >>> qemu_get_aio_context()", file=0x558ed44fce80 >>> "/home/febner/repos/qemu/block/block-gen.h", line=43, >>> function=<optimized out>) at assert.c:92 >>> #3 0x00007f9027b4d662 in __GI___assert_fail >>> (assertion=0x558ed44fcec0 "qemu_get_current_aio_context() == >>> qemu_get_aio_context()", file=0x558ed44fce80 >>> "/home/febner/repos/qemu/block/block-gen.h", line=43, >>> function=0x558ed44fcf80 <__PRETTY_FUNCTION__.14> "bdrv_poll_co") at >>> assert.c:101 >>> #4 0x0000558ed412df5f in bdrv_poll_co (s=0x7f8ffcff37a0) at /home/febner/repos/qemu/block/block-gen.h:43 >>> #5 0x0000558ed412f4cd in bdrv_writev_vmstate (bs=0x558ed60536a0, qiov=0x7f8ffcff3840, pos=0) at block/block-gen.c:809 >>> #6 0x0000558ed3df36d0 in qio_channel_block_writev >>> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, >>> flags=0, errp=0x7f8ffcff39c0) at ../migration/channel-block.c:89 >>> #7 0x0000558ed40feedb in qio_channel_writev_full >>> (ioc=0x7f8ff40ac060, iov=0x7f8ff43f6350, niov=1, fds=0x0, nfds=0, >>> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:108 >>> #8 0x0000558ed40ff3c3 in qio_channel_writev_full_all >>> (ioc=0x7f8ff40ac060, iov=0x7f8ff4648040, niov=1, fds=0x0, nfds=0, >>> flags=0, errp=0x7f8ffcff39c0) at ../io/channel.c:263 >>> #9 0x0000558ed40ff2e4 in qio_channel_writev_all (ioc=0x7f8ff40ac060, >>> iov=0x7f8ff4648040, niov=1, errp=0x7f8ffcff39c0) at >>> ../io/channel.c:242 >>> #10 0x0000558ed3dee4dc in qemu_fflush (f=0x7f8ff4640000) at ../migration/qemu-file.c:302 >>> #11 0x0000558ed4050f91 in ram_save_setup (f=0x7f8ff4640000, opaque=0x558ed4ca34c0 <ram_state>) at ../migration/ram.c:3302 >>> #12 0x0000558ed3e141c8 in qemu_savevm_state_setup (f=0x7f8ff4640000) at ../migration/savevm.c:1266 >>> #13 0x0000558ed3e14eed in qemu_savevm_state (f=0x7f8ff4640000, errp=0x558ed68c5238) at ../migration/savevm.c:1626 >>> #14 0x0000558ed3e1755e in save_snapshot (name=0x558ed72af790 >>> "snap0", overwrite=false, vmstate=0x558ed6708ce0 "scsi0", >>> has_devices=true, devices=0x558ed66d6a60, errp=0x558ed68c5238) at >>> ../migration/savevm.c:2954 >>> #15 0x0000558ed3e17fb1 in snapshot_save_job_bh (opaque=0x558ed68c5170) at ../migration/savevm.c:3253 >>> #16 0x0000558ed42f050a in aio_bh_call (bh=0x558ed671ae00) at ../util/async.c:155 >>> #17 0x0000558ed42f0615 in aio_bh_poll (ctx=0x558ed5c62910) at ../util/async.c:184 >>> #18 0x0000558ed42d47b8 in aio_poll (ctx=0x558ed5c62910, blocking=true) at ../util/aio-posix.c:721 >>> #19 0x0000558ed412df1c in bdrv_poll_co (s=0x7f8ffcff3eb0) at /home/febner/repos/qemu/block/block-gen.h:43 >>> #20 0x0000558ed4130c3a in blk_pwrite (blk=0x558ed5ed4f60, >>> offset=230912, bytes=512, buf=0x7f8ffc438600, flags=0) at >>> block/block-gen.c:1650 >>> #21 0x0000558ed3ba9078 in pflash_update (pfl=0x558ed5eb7b30, offset=230912, size=1) at ../hw/block/pflash_cfi01.c:394 >>> #22 0x0000558ed3ba9749 in pflash_write (pfl=0x558ed5eb7b30, >>> offset=231232, value=0, width=1, be=0) at >>> ../hw/block/pflash_cfi01.c:522 >>> #23 0x0000558ed3ba9cda in pflash_mem_write_with_attrs >>> (opaque=0x558ed5eb7b30, addr=231232, value=0, len=1, attrs=...) at >>> ../hw/block/pflash_cfi01.c:681 >>> #24 0x0000558ed402a36a in memory_region_write_with_attrs_accessor >>> (mr=0x558ed5eb7ef0, addr=231232, value=0x7f8ffcff40c8, size=1, >>> shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514 >>> #25 0x0000558ed402a4a9 in access_with_adjusted_size (addr=231232, >>> value=0x7f8ffcff40c8, size=1, access_size_min=1, access_size_max=4, >>> access_fn=0x558ed402a270 <memory_region_write_with_attrs_accessor>, >>> mr=0x558ed5eb7ef0, attrs=...) at ../softmmu/memory.c:555 >>> #26 0x0000558ed402d5de in memory_region_dispatch_write >>> (mr=0x558ed5eb7ef0, addr=231232, data=0, op=MO_8, attrs=...) at >>> ../softmmu/memory.c:1522 >>> #27 0x0000558ed403a6f4 in flatview_write_continue >>> (fv=0x558ed66d62c0, addr=4291004224, attrs=..., ptr=0x7f9029957028, >>> len=1, addr1=231232, l=1, mr=0x558ed5eb7ef0) at >>> ../softmmu/physmem.c:2641 >>> #28 0x0000558ed403a857 in flatview_write (fv=0x558ed66d62c0, >>> addr=4291004224, attrs=..., buf=0x7f9029957028, len=1) at >>> ../softmmu/physmem.c:2683 >>> #29 0x0000558ed403ac07 in address_space_write (as=0x558ed4ca2b20 >>> <address_space_memory>, addr=4291004224, attrs=..., >>> buf=0x7f9029957028, len=1) at ../softmmu/physmem.c:2779 >>> #30 0x0000558ed403ac74 in address_space_rw (as=0x558ed4ca2b20 >>> <address_space_memory>, addr=4291004224, attrs=..., >>> buf=0x7f9029957028, len=1, is_write=true) at >>> ../softmmu/physmem.c:2789 >>> #31 0x0000558ed40cea88 in kvm_cpu_exec (cpu=0x558ed622a910) at ../accel/kvm/kvm-all.c:2989 >>> #32 0x0000558ed40d179a in kvm_vcpu_thread_fn (arg=0x558ed622a910) at ../accel/kvm/kvm-accel-ops.c:51 >>> #33 0x0000558ed42d925f in qemu_thread_start (args=0x558ed5c68c80) at ../util/qemu-thread-posix.c:541 >>> #34 0x00007f9028ab7ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 >>> #35 0x00007f9027c18a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > To see that I am understading this right: > > - you create a thread > - that calls a memory_region operation > - that calls a device write function > - that calls the block layer > - that creates a snapshot > - that calls the migration code > - that calls the block layer again > > Without further investigation, I have no clue what is going on here, > sorry. > > Later, Juan. > All I'm doing is using QEMU (a build of upstream's v8.0.0) in intended ways, I promise! In particular, I'm doing two things at the same time repeatedly: 1. Write to a pflash drive from within the guest. 2. Issue a snapshot-save QMP command (in a way that doesn't lead to an early error). (I actually also used a debugger to break on pflash_update and snapshot_save_job_bh, manually continuing until I triggered the problematic situation. It's very racy, because it depends on the host OS to switch threads at the correct time.) Now we need to be aware of two things: 1. As discussed earlier in the mail thread, if the host OS switches threads at an inconvenient time, it can happen that a bottom half scheduled for the main thread's AioContext can be executed as part of a vCPU thread's aio_poll. 2. Generated coroutine wrappers for block layer functions spawn the coroutine and use AIO_WAIT_WHILE/aio_poll to wait for it to finish. What happens in the backtrace above is: 1. The write to the pflash drive uses blk_pwrite which leads to an aio_poll in the vCPU thread. 2. The snapshot_save_job_bh bottom half, that was scheduled for the main thread's AioContext, is executed as part of the vCPU thread's aio_poll. 3. qemu_savevm_state is called. 4. qemu_mutex_unlock_iothread is called. Now qemu_get_current_aio_context returns 0x0. Usually, snapshot_save_job_bh runs in the main thread, in which case qemu_get_current_aio_context still returns the main thread's AioContext at this point. 5. bdrv_writev_vmstate is executed as part of the usual savevm setup. 6. bdrv_writev_vmstate is a generated coroutine wrapper, so it uses AIO_WAIT_WHILE. 7. The assertion to have the main thread's AioContext inside the AIO_WAIT_WHILE macro fails. Best Regards, Fiona ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-28 7:23 ` Fiona Ebner @ 2023-04-28 7:47 ` Juan Quintela 2023-04-28 8:30 ` Kevin Wolf 0 siblings, 1 reply; 21+ messages in thread From: Juan Quintela @ 2023-04-28 7:47 UTC (permalink / raw) To: Fiona Ebner Cc: Kevin Wolf, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Fiona Ebner <f.ebner@proxmox.com> wrote: > Am 27.04.23 um 16:36 schrieb Juan Quintela: >> Fiona Ebner <f.ebner@proxmox.com> wrote: >>> Am 27.04.23 um 13:03 schrieb Kevin Wolf: >>>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: >>>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini: >> >> Hi >> >>> Our function is a custom variant of saving a snapshot and uses >>> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread() >>> is there. I looked for inspiration for how upstream does things and it >>> turns out that upstream QEMU v8.0.0 has essentially the same issue with >>> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead >>> of the main thread, the situation is the same: after >>> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return >>> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails >>> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0]. >>> >>> >>> So all bottom halves scheduled for the main thread's AioContext can >>> potentially get to run in a vCPU thread and need to be very careful with >>> things like qemu_mutex_unlock_iothread. >>> >>> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't >>> looked into why it happens yet. Does there need to be a way to drop the >>> BQL without also giving up the main thread's AioContext or would it be >>> enough to re-acquire the context? >>> >>> CC-ing Juan as the migration maintainer. >> >> This is the world backwards. >> The tradition is that migration people blame block layer people for >> breaking things and for help, not the other way around O:-) > > Sorry, if I didn't provide enough context/explanation. See below for my > attempt to re-iterate. I CC'ed you, because the issue happens as part of > snapshot-save and in particular the qemu_mutex_unlock_iothread call in > qemu_savevm_state is one of the ingredients leading to the problem. This was a joke O:-) >> To see that I am understading this right: >> >> - you create a thread >> - that calls a memory_region operation >> - that calls a device write function >> - that calls the block layer >> - that creates a snapshot >> - that calls the migration code >> - that calls the block layer again >> >> Without further investigation, I have no clue what is going on here, >> sorry. >> >> Later, Juan. >> > > All I'm doing is using QEMU (a build of upstream's v8.0.0) in intended > ways, I promise! In particular, I'm doing two things at the same time > repeatedly: > 1. Write to a pflash drive from within the guest. > 2. Issue a snapshot-save QMP command (in a way that doesn't lead to an > early error). > > (I actually also used a debugger to break on pflash_update and > snapshot_save_job_bh, manually continuing until I triggered the > problematic situation. It's very racy, because it depends on the host OS > to switch threads at the correct time.) I think the block layer is the problem (famous last words) > > Now we need to be aware of two things: > 1. As discussed earlier in the mail thread, if the host OS switches > threads at an inconvenient time, it can happen that a bottom half > scheduled for the main thread's AioContext can be executed as part of a > vCPU thread's aio_poll. > 2. Generated coroutine wrappers for block layer functions spawn the > coroutine and use AIO_WAIT_WHILE/aio_poll to wait for it to finish. > > What happens in the backtrace above is: > 1. The write to the pflash drive uses blk_pwrite which leads to an > aio_poll in the vCPU thread. > 2. The snapshot_save_job_bh bottom half, that was scheduled for the main > thread's AioContext, is executed as part of the vCPU thread's aio_poll. > 3. qemu_savevm_state is called. > 4. qemu_mutex_unlock_iothread is called. Now > qemu_get_current_aio_context returns 0x0. Usually, snapshot_save_job_bh > runs in the main thread, in which case qemu_get_current_aio_context > still returns the main thread's AioContext at this point. I am perhaps a bit ingenuous here, but it is there a way to convince qemu that snapshot_save_job_bh *HAS* to run on the main thread? > 5. bdrv_writev_vmstate is executed as part of the usual savevm setup. > 6. bdrv_writev_vmstate is a generated coroutine wrapper, so it uses > AIO_WAIT_WHILE. > 7. The assertion to have the main thread's AioContext inside the > AIO_WAIT_WHILE macro fails. several problems here: a- There is no "test/qtest/snapshot-test" around Hint, Hint. Basically snapshots are the bastard sibling of migration, and nobody really tests them. b- In an ideal world, migration shouldn't need a bottom_handler Remember, it has its own thread. But there are functions that can only been called from the main thread. And no, I don't remember which, I just try very hard not to touch that part of the code. c- At that point the vcpus are stopped, for migration it doesn't matter a lot(*) to have to use a bottom handler. d- snapshots are a completely different beast, that don't really stop the guest in the same way at that point, and sometimes it shows in this subtle details. I am sorry that I can't be of more help. I still think that the block layer people should have a clue at what is going on here. Later, Juan. *: That is not completely true. There is no way to get back from the bottom handler to the iterative stage of migration. If something happens at that point, we just have to cancel migration. But we will left this discussion for another day. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-28 7:47 ` Juan Quintela @ 2023-04-28 8:30 ` Kevin Wolf 2023-04-28 8:38 ` Juan Quintela 0 siblings, 1 reply; 21+ messages in thread From: Kevin Wolf @ 2023-04-28 8:30 UTC (permalink / raw) To: Juan Quintela Cc: Fiona Ebner, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Am 28.04.2023 um 09:47 hat Juan Quintela geschrieben: > Fiona Ebner <f.ebner@proxmox.com> wrote: > > Am 27.04.23 um 16:36 schrieb Juan Quintela: > >> Fiona Ebner <f.ebner@proxmox.com> wrote: > >>> Am 27.04.23 um 13:03 schrieb Kevin Wolf: > >>>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben: > >>>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini: > >> > >> Hi > >> > >>> Our function is a custom variant of saving a snapshot and uses > >>> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread() > >>> is there. I looked for inspiration for how upstream does things and it > >>> turns out that upstream QEMU v8.0.0 has essentially the same issue with > >>> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead > >>> of the main thread, the situation is the same: after > >>> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return > >>> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails > >>> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0]. > >>> > >>> > >>> So all bottom halves scheduled for the main thread's AioContext can > >>> potentially get to run in a vCPU thread and need to be very careful with > >>> things like qemu_mutex_unlock_iothread. > >>> > >>> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't > >>> looked into why it happens yet. Does there need to be a way to drop the > >>> BQL without also giving up the main thread's AioContext or would it be > >>> enough to re-acquire the context? > >>> > >>> CC-ing Juan as the migration maintainer. > >> > >> This is the world backwards. > >> The tradition is that migration people blame block layer people for > >> breaking things and for help, not the other way around O:-) > > > > Sorry, if I didn't provide enough context/explanation. See below for my > > attempt to re-iterate. I CC'ed you, because the issue happens as part of > > snapshot-save and in particular the qemu_mutex_unlock_iothread call in > > qemu_savevm_state is one of the ingredients leading to the problem. > > This was a joke O:-) > > >> To see that I am understading this right: > >> > >> - you create a thread > >> - that calls a memory_region operation > >> - that calls a device write function > >> - that calls the block layer > >> - that creates a snapshot > >> - that calls the migration code > >> - that calls the block layer again > >> > >> Without further investigation, I have no clue what is going on here, > >> sorry. > >> > >> Later, Juan. > >> > > > > All I'm doing is using QEMU (a build of upstream's v8.0.0) in intended > > ways, I promise! In particular, I'm doing two things at the same time > > repeatedly: > > 1. Write to a pflash drive from within the guest. > > 2. Issue a snapshot-save QMP command (in a way that doesn't lead to an > > early error). > > > > (I actually also used a debugger to break on pflash_update and > > snapshot_save_job_bh, manually continuing until I triggered the > > problematic situation. It's very racy, because it depends on the host OS > > to switch threads at the correct time.) > > I think the block layer is the problem (famous last words) > > > > > Now we need to be aware of two things: > > 1. As discussed earlier in the mail thread, if the host OS switches > > threads at an inconvenient time, it can happen that a bottom half > > scheduled for the main thread's AioContext can be executed as part of a > > vCPU thread's aio_poll. > > 2. Generated coroutine wrappers for block layer functions spawn the > > coroutine and use AIO_WAIT_WHILE/aio_poll to wait for it to finish. > > > > What happens in the backtrace above is: > > 1. The write to the pflash drive uses blk_pwrite which leads to an > > aio_poll in the vCPU thread. > > 2. The snapshot_save_job_bh bottom half, that was scheduled for the main > > thread's AioContext, is executed as part of the vCPU thread's aio_poll. > > 3. qemu_savevm_state is called. > > 4. qemu_mutex_unlock_iothread is called. Now > > qemu_get_current_aio_context returns 0x0. Usually, snapshot_save_job_bh > > runs in the main thread, in which case qemu_get_current_aio_context > > still returns the main thread's AioContext at this point. > > I am perhaps a bit ingenuous here, but it is there a way to convince > qemu that snapshot_save_job_bh *HAS* to run on the main thread? I believe we're talking about a technicality here. I asked another more fundamental question that nobody has answered yet: Why do you think that it's ok to call bdrv_writev_vmstate() without holding the BQL? Because if we come to the conclusion that it's not ok (which is what I think), then it doesn't matter whether we violate the condition in the main thread or a vcpu thread. It's wrong in both cases, just the failure mode differs - one crashes spectacularly with an assertion failure, the other has a race condition. Moving from the assertion failure to a race condition is not a proper fix. > > 5. bdrv_writev_vmstate is executed as part of the usual savevm setup. > > 6. bdrv_writev_vmstate is a generated coroutine wrapper, so it uses > > AIO_WAIT_WHILE. > > 7. The assertion to have the main thread's AioContext inside the > > AIO_WAIT_WHILE macro fails. Kevin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-28 8:30 ` Kevin Wolf @ 2023-04-28 8:38 ` Juan Quintela 2023-04-28 12:22 ` Kevin Wolf 0 siblings, 1 reply; 21+ messages in thread From: Juan Quintela @ 2023-04-28 8:38 UTC (permalink / raw) To: Kevin Wolf Cc: Fiona Ebner, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Kevin Wolf <kwolf@redhat.com> wrote: >> >> I am perhaps a bit ingenuous here, but it is there a way to convince >> qemu that snapshot_save_job_bh *HAS* to run on the main thread? > > I believe we're talking about a technicality here. I asked another more > fundamental question that nobody has answered yet: > > Why do you think that it's ok to call bdrv_writev_vmstate() without > holding the BQL? I will say this function starts by bdrv_ (i.e. block layer people) and endes with _vmstate (i.e. migration people). To be honest, I don't know. That is why I _supposed_ you have an idea. > Because if we come to the conclusion that it's not ok (which is what I > think), then it doesn't matter whether we violate the condition in the > main thread or a vcpu thread. It's wrong in both cases, just the failure > mode differs - one crashes spectacularly with an assertion failure, the > other has a race condition. Moving from the assertion failure to a race > condition is not a proper fix. Fully agree there. Just that I don't know the answer. Later, Juan. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-28 8:38 ` Juan Quintela @ 2023-04-28 12:22 ` Kevin Wolf 2023-04-28 16:54 ` Juan Quintela 0 siblings, 1 reply; 21+ messages in thread From: Kevin Wolf @ 2023-04-28 12:22 UTC (permalink / raw) To: Juan Quintela Cc: Fiona Ebner, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben: > Kevin Wolf <kwolf@redhat.com> wrote: > >> I am perhaps a bit ingenuous here, but it is there a way to convince > >> qemu that snapshot_save_job_bh *HAS* to run on the main thread? > > > > I believe we're talking about a technicality here. I asked another more > > fundamental question that nobody has answered yet: > > > > Why do you think that it's ok to call bdrv_writev_vmstate() without > > holding the BQL? > > I will say this function starts by bdrv_ (i.e. block layer people) and > endes with _vmstate (i.e. migration people). > > To be honest, I don't know. That is why I _supposed_ you have an idea. My idea is that bdrv_*() can only be called when you hold the BQL, or for BlockDriverStates in an iothread the AioContext lock. Apparently dropping the BQL in migration code was introduced in Paolo's commit 9b095037527. I'm not sure what this was supposed to improve in the case of snapshots because the VM is stopped anyway. Would anything bad happen if we removed the BQL unlock/lock section in qemu_savevm_state() again? Kevin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-28 12:22 ` Kevin Wolf @ 2023-04-28 16:54 ` Juan Quintela 2023-05-02 10:03 ` Fiona Ebner 0 siblings, 1 reply; 21+ messages in thread From: Juan Quintela @ 2023-04-28 16:54 UTC (permalink / raw) To: Kevin Wolf Cc: Fiona Ebner, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Kevin Wolf <kwolf@redhat.com> wrote: > Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben: >> Kevin Wolf <kwolf@redhat.com> wrote: >> >> I am perhaps a bit ingenuous here, but it is there a way to convince >> >> qemu that snapshot_save_job_bh *HAS* to run on the main thread? >> > >> > I believe we're talking about a technicality here. I asked another more >> > fundamental question that nobody has answered yet: >> > >> > Why do you think that it's ok to call bdrv_writev_vmstate() without >> > holding the BQL? >> >> I will say this function starts by bdrv_ (i.e. block layer people) and >> endes with _vmstate (i.e. migration people). >> >> To be honest, I don't know. That is why I _supposed_ you have an idea. > > My idea is that bdrv_*() can only be called when you hold the BQL, or > for BlockDriverStates in an iothread the AioContext lock. > > Apparently dropping the BQL in migration code was introduced in Paolo's > commit 9b095037527. Damn. I reviewed it, so I am as guilty as the author. 10 years later without problems I will not blame that patch. I guess we changed something else that broke doing it without the lock. But no, I still don't have suggestions/ideas. > I'm not sure what this was supposed to improve in > the case of snapshots because the VM is stopped anyway. > > Would anything bad happen if we removed the BQL unlock/lock section in > qemu_savevm_state() again? Dunno. For what is worth, I can say that it survives migration-test, but don't ask me why/how/... Fiona, can you check if it fixes your troubles? Later, Juan. > Kevin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-04-28 16:54 ` Juan Quintela @ 2023-05-02 10:03 ` Fiona Ebner 2023-05-02 10:25 ` Fiona Ebner 2023-05-02 10:30 ` Juan Quintela 0 siblings, 2 replies; 21+ messages in thread From: Fiona Ebner @ 2023-05-02 10:03 UTC (permalink / raw) To: quintela, Kevin Wolf Cc: Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Am 28.04.23 um 18:54 schrieb Juan Quintela: > Kevin Wolf <kwolf@redhat.com> wrote: >> Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben: >>> Kevin Wolf <kwolf@redhat.com> wrote: >>>>> I am perhaps a bit ingenuous here, but it is there a way to convince >>>>> qemu that snapshot_save_job_bh *HAS* to run on the main thread? >>>> >>>> I believe we're talking about a technicality here. I asked another more >>>> fundamental question that nobody has answered yet: >>>> >>>> Why do you think that it's ok to call bdrv_writev_vmstate() without >>>> holding the BQL? >>> >>> I will say this function starts by bdrv_ (i.e. block layer people) and >>> endes with _vmstate (i.e. migration people). >>> >>> To be honest, I don't know. That is why I _supposed_ you have an idea. >> >> My idea is that bdrv_*() can only be called when you hold the BQL, or >> for BlockDriverStates in an iothread the AioContext lock. >> >> Apparently dropping the BQL in migration code was introduced in Paolo's >> commit 9b095037527. > > Damn. I reviewed it, so I am as guilty as the author. > 10 years later without problems I will not blame that patch. > > I guess we changed something else that broke doing it without the lock. > > But no, I still don't have suggestions/ideas. > I do feel like the issue might be very difficult to trigger under normal circumstances. Depending on the configuration and what you do in the guest, aio_poll in a vCPU thread does not happen often and I imagine snapshot-save is also not a super frequent operation for most people. It still takes me a while to trigger the issue by issuing lots of pflash writes and running snapshot-save in a loop, I'd guess about 30-60 snapshots. Another reason might be that generated co-wrappers were less common in the past? >> I'm not sure what this was supposed to improve in >> the case of snapshots because the VM is stopped anyway. Is it? Quoting Juan:> d- snapshots are a completely different beast, that don't really stop > the guest in the same way at that point, and sometimes it shows in > this subtle details. >> Would anything bad happen if we removed the BQL unlock/lock section in >> qemu_savevm_state() again? > > Dunno. > > For what is worth, I can say that it survives migration-test, but don't > ask me why/how/... > > Fiona, can you check if it fixes your troubles? > Just removing the single section in qemu_savevm_state() breaks even the case where snapshot_save_job_bh() is executed in the main thread, because ram_init_bitmaps() will call qemu_mutex_lock_iothread_impl() which asserts that it's not already locked. Also removing the lock/unlock pair in ram_init_bitmaps() seems to work. But I'm not sure what else a full semantic revert of commit 9b095037527 would entail. Best Regards, Fiona ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-05-02 10:03 ` Fiona Ebner @ 2023-05-02 10:25 ` Fiona Ebner 2023-05-02 10:35 ` Juan Quintela 2023-05-02 10:30 ` Juan Quintela 1 sibling, 1 reply; 21+ messages in thread From: Fiona Ebner @ 2023-05-02 10:25 UTC (permalink / raw) To: quintela, Kevin Wolf Cc: Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Am 02.05.23 um 12:03 schrieb Fiona Ebner: > Am 28.04.23 um 18:54 schrieb Juan Quintela: >> Kevin Wolf <kwolf@redhat.com> wrote: >>> Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben: >>>> Kevin Wolf <kwolf@redhat.com> wrote: >>>>>> I am perhaps a bit ingenuous here, but it is there a way to convince >>>>>> qemu that snapshot_save_job_bh *HAS* to run on the main thread? >>>>> >>>>> I believe we're talking about a technicality here. I asked another more >>>>> fundamental question that nobody has answered yet: >>>>> >>>>> Why do you think that it's ok to call bdrv_writev_vmstate() without >>>>> holding the BQL? >>>> >>>> I will say this function starts by bdrv_ (i.e. block layer people) and >>>> endes with _vmstate (i.e. migration people). >>>> >>>> To be honest, I don't know. That is why I _supposed_ you have an idea. >>> >>> My idea is that bdrv_*() can only be called when you hold the BQL, or >>> for BlockDriverStates in an iothread the AioContext lock. >>> >>> Apparently dropping the BQL in migration code was introduced in Paolo's >>> commit 9b095037527. >> >> Damn. I reviewed it, so I am as guilty as the author. >> 10 years later without problems I will not blame that patch. >> >> I guess we changed something else that broke doing it without the lock. >> >> But no, I still don't have suggestions/ideas. >> > > I do feel like the issue might be very difficult to trigger under normal > circumstances. Depending on the configuration and what you do in the > guest, aio_poll in a vCPU thread does not happen often and I imagine > snapshot-save is also not a super frequent operation for most people. It > still takes me a while to trigger the issue by issuing lots of pflash > writes and running snapshot-save in a loop, I'd guess about 30-60 > snapshots. Another reason might be that generated co-wrappers were less > common in the past? > >>> I'm not sure what this was supposed to improve in >>> the case of snapshots because the VM is stopped anyway. > > Is it? Quoting Juan:> d- snapshots are a completely different beast, > that don't really stop >> the guest in the same way at that point, and sometimes it shows in >> this subtle details. > >>> Would anything bad happen if we removed the BQL unlock/lock section in >>> qemu_savevm_state() again? >> >> Dunno. >> >> For what is worth, I can say that it survives migration-test, but don't >> ask me why/how/... >> >> Fiona, can you check if it fixes your troubles? >> > > Just removing the single section in qemu_savevm_state() breaks even the > case where snapshot_save_job_bh() is executed in the main thread, > because ram_init_bitmaps() will call qemu_mutex_lock_iothread_impl() > which asserts that it's not already locked. > > Also removing the lock/unlock pair in ram_init_bitmaps() seems to work. Well, after a few more attempts, I got a new failure (running with the two changes mentioned above), but it seems to happen later and, at a first glance, doesn't seem to be related to the lock anymore: > qemu-system-x86_64: ../hw/net/virtio-net.c:3811: virtio_net_pre_save: Assertion `!n->vhost_started' failed. > (gdb) call qemu_mutex_iothread_locked() > $1 = true Backtrace: > Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted. > [Switching to Thread 0x7fd291ffb700 (LWP 136620)] > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007fd2b8b3e537 in __GI_abort () at abort.c:79 > #2 0x00007fd2b8b3e40f in __assert_fail_base (fmt=0x7fd2b8cb66a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x557e0ebb7193 "!n->vhost_started", file=0x557e0ebb65f0 "../hw/net/virtio-net.c", line=3811, function=<optimized out>) at assert.c:92 > #3 0x00007fd2b8b4d662 in __GI___assert_fail (assertion=0x557e0ebb7193 "!n->vhost_started", file=0x557e0ebb65f0 "../hw/net/virtio-net.c", line=3811, function=0x557e0ebb7740 <__PRETTY_FUNCTION__.3> "virtio_net_pre_save") at assert.c:101 > #4 0x0000557e0e6c21c9 in virtio_net_pre_save (opaque=0x557e12c480b0) at ../hw/net/virtio-net.c:3811 > #5 0x0000557e0e4f63ee in vmstate_save_state_v (f=0x7fd28442f730, vmsd=0x557e0efb53c0 <vmstate_virtio_net>, opaque=0x557e12c480b0, vmdesc=0x7fd2840978e0, version_id=11) at ../migration/vmstate.c:330 > #6 0x0000557e0e4f6390 in vmstate_save_state (f=0x7fd28442f730, vmsd=0x557e0efb53c0 <vmstate_virtio_net>, opaque=0x557e12c480b0, vmdesc_id=0x7fd2840978e0) at ../migration/vmstate.c:318 > #7 0x0000557e0e51c80a in vmstate_save (f=0x7fd28442f730, se=0x557e12c4c430, vmdesc=0x7fd2840978e0) at ../migration/savevm.c:1000 > #8 0x0000557e0e51d942 in qemu_savevm_state_complete_precopy_non_iterable (f=0x7fd28442f730, in_postcopy=false, inactivate_disks=false) at ../migration/savevm.c:1463 > #9 0x0000557e0e51db33 in qemu_savevm_state_complete_precopy (f=0x7fd28442f730, iterable_only=false, inactivate_disks=false) at ../migration/savevm.c:1529 > #10 0x0000557e0e51df3d in qemu_savevm_state (f=0x7fd28442f730, errp=0x7fd28425e1f8) at ../migration/savevm.c:1635 > #11 0x0000557e0e520548 in save_snapshot (name=0x7fd28425e2b0 "snap0", overwrite=false, vmstate=0x7fd284479c20 "scsi0", has_devices=true, devices=0x7fd284097920, errp=0x7fd28425e1f8) at ../migration/savevm.c:2952 > #12 0x0000557e0e520f9b in snapshot_save_job_bh (opaque=0x7fd28425e130) at ../migration/savevm.c:3251 > #13 0x0000557e0e9f94de in aio_bh_call (bh=0x7fd2844703f0) at ../util/async.c:155 > #14 0x0000557e0e9f95e9 in aio_bh_poll (ctx=0x557e110ae910) at ../util/async.c:184 > #15 0x0000557e0e9dd78c in aio_poll (ctx=0x557e110ae910, blocking=true) at ../util/aio-posix.c:721 > #16 0x0000557e0e836ef0 in bdrv_poll_co (s=0x7fd291ff5eb0) at /home/febner/repos/qemu/block/block-gen.h:43 > #17 0x0000557e0e839c0e in blk_pwrite (blk=0x557e11320f40, offset=58880, bytes=512, buf=0x7fd290c0e600, flags=0) at block/block-gen.c:1650 > #18 0x0000557e0e2b2078 in pflash_update (pfl=0x557e11303b30, offset=58880, size=1) at ../hw/block/pflash_cfi01.c:394 > #19 0x0000557e0e2b2749 in pflash_write (pfl=0x557e11303b30, offset=59338, value=62, width=1, be=0) at ../hw/block/pflash_cfi01.c:522 > #20 0x0000557e0e2b2cda in pflash_mem_write_with_attrs (opaque=0x557e11303b30, addr=59338, value=62, len=1, attrs=...) at ../hw/block/pflash_cfi01.c:681 > #21 0x0000557e0e733354 in memory_region_write_with_attrs_accessor (mr=0x557e11303ef0, addr=59338, value=0x7fd291ff60c8, size=1, shift=0, mask=255, attrs=...) at ../softmmu/memory.c:514 > #22 0x0000557e0e733493 in access_with_adjusted_size (addr=59338, value=0x7fd291ff60c8, size=1, access_size_min=1, access_size_max=4, access_fn=0x557e0e73325a <memory_region_write_with_attrs_accessor>, mr=0x557e11303ef0, attrs=...) at ../softmmu/memory.c:555 > #23 0x0000557e0e7365c8 in memory_region_dispatch_write (mr=0x557e11303ef0, addr=59338, data=62, op=MO_8, attrs=...) at ../softmmu/memory.c:1522 > #24 0x0000557e0e7436de in flatview_write_continue (fv=0x7fd2848bce90, addr=4290832330, attrs=..., ptr=0x7fd2ba9b7028, len=1, addr1=59338, l=1, mr=0x557e11303ef0) at ../softmmu/physmem.c:2641 > #25 0x0000557e0e743841 in flatview_write (fv=0x7fd2848bce90, addr=4290832330, attrs=..., buf=0x7fd2ba9b7028, len=1) at ../softmmu/physmem.c:2683 > #26 0x0000557e0e743bf1 in address_space_write (as=0x557e0f3abb20 <address_space_memory>, addr=4290832330, attrs=..., buf=0x7fd2ba9b7028, len=1) at ../softmmu/physmem.c:2779 > #27 0x0000557e0e743c5e in address_space_rw (as=0x557e0f3abb20 <address_space_memory>, addr=4290832330, attrs=..., buf=0x7fd2ba9b7028, len=1, is_write=true) at ../softmmu/physmem.c:2789 > #28 0x0000557e0e7d7a5c in kvm_cpu_exec (cpu=0x557e11676910) at ../accel/kvm/kvm-all.c:2989 > #29 0x0000557e0e7da76e in kvm_vcpu_thread_fn (arg=0x557e11676910) at ../accel/kvm/kvm-accel-ops.c:51 > #30 0x0000557e0e9e2233 in qemu_thread_start (args=0x557e11553fb0) at ../util/qemu-thread-posix.c:541 > #31 0x00007fd2b9b17ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477 > #32 0x00007fd2b8c18a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-05-02 10:25 ` Fiona Ebner @ 2023-05-02 10:35 ` Juan Quintela 2023-05-02 12:49 ` Fiona Ebner 0 siblings, 1 reply; 21+ messages in thread From: Juan Quintela @ 2023-05-02 10:35 UTC (permalink / raw) To: Fiona Ebner Cc: Kevin Wolf, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Fiona Ebner <f.ebner@proxmox.com> wrote: > Am 02.05.23 um 12:03 schrieb Fiona Ebner: >> Am 28.04.23 um 18:54 schrieb Juan Quintela: >>> Kevin Wolf <kwolf@redhat.com> wrote: >>>> Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben: >>>>> Kevin Wolf <kwolf@redhat.com> wrote: >>>>>>> I am perhaps a bit ingenuous here, but it is there a way to convince >>>>>>> qemu that snapshot_save_job_bh *HAS* to run on the main thread? >>>>>> >>>>>> I believe we're talking about a technicality here. I asked another more >>>>>> fundamental question that nobody has answered yet: >>>>>> >>>>>> Why do you think that it's ok to call bdrv_writev_vmstate() without >>>>>> holding the BQL? >>>>> >>>>> I will say this function starts by bdrv_ (i.e. block layer people) and >>>>> endes with _vmstate (i.e. migration people). >>>>> >>>>> To be honest, I don't know. That is why I _supposed_ you have an idea. >>>> >>>> My idea is that bdrv_*() can only be called when you hold the BQL, or >>>> for BlockDriverStates in an iothread the AioContext lock. >>>> >>>> Apparently dropping the BQL in migration code was introduced in Paolo's >>>> commit 9b095037527. >>> >>> Damn. I reviewed it, so I am as guilty as the author. >>> 10 years later without problems I will not blame that patch. >>> >>> I guess we changed something else that broke doing it without the lock. >>> >>> But no, I still don't have suggestions/ideas. >>> >> >> I do feel like the issue might be very difficult to trigger under normal >> circumstances. Depending on the configuration and what you do in the >> guest, aio_poll in a vCPU thread does not happen often and I imagine >> snapshot-save is also not a super frequent operation for most people. It >> still takes me a while to trigger the issue by issuing lots of pflash >> writes and running snapshot-save in a loop, I'd guess about 30-60 >> snapshots. Another reason might be that generated co-wrappers were less >> common in the past? >> >>>> I'm not sure what this was supposed to improve in >>>> the case of snapshots because the VM is stopped anyway. >> >> Is it? Quoting Juan:> d- snapshots are a completely different beast, >> that don't really stop >>> the guest in the same way at that point, and sometimes it shows in >>> this subtle details. >> >>>> Would anything bad happen if we removed the BQL unlock/lock section in >>>> qemu_savevm_state() again? >>> >>> Dunno. >>> >>> For what is worth, I can say that it survives migration-test, but don't >>> ask me why/how/... >>> >>> Fiona, can you check if it fixes your troubles? >>> >> >> Just removing the single section in qemu_savevm_state() breaks even the >> case where snapshot_save_job_bh() is executed in the main thread, >> because ram_init_bitmaps() will call qemu_mutex_lock_iothread_impl() >> which asserts that it's not already locked. >> >> Also removing the lock/unlock pair in ram_init_bitmaps() seems to work. > > Well, after a few more attempts, I got a new failure (running with the > two changes mentioned above), but it seems to happen later and, at a > first glance, doesn't seem to be related to the lock anymore: Can you revert the whole commit: commmit 9b0950375277467fd74a9075624477ae43b9bb22 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Fri Feb 22 17:36:28 2013 +0100 migration: run setup callbacks out of big lock Because we are again at: > >> Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted. >> [Switching to Thread 0x7fd291ffb700 (LWP 136620)] >> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >> 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> (gdb) bt >> #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >> #1 0x00007fd2b8b3e537 in __GI_abort () at abort.c:79 >> #2 0x00007fd2b8b3e40f in __assert_fail_base (fmt=0x7fd2b8cb66a8 >> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", >> assertion=0x557e0ebb7193 "!n->vhost_started", file=0x557e0ebb65f0 >> "../hw/net/virtio-net.c", line=3811, function=<optimized out>) at >> assert.c:92 >> #3 0x00007fd2b8b4d662 in __GI___assert_fail >> (assertion=0x557e0ebb7193 "!n->vhost_started", file=0x557e0ebb65f0 >> "../hw/net/virtio-net.c", line=3811, function=0x557e0ebb7740 >> <__PRETTY_FUNCTION__.3> "virtio_net_pre_save") at assert.c:101 >> #4 0x0000557e0e6c21c9 in virtio_net_pre_save (opaque=0x557e12c480b0) at ../hw/net/virtio-net.c:3811 >> #5 0x0000557e0e4f63ee in vmstate_save_state_v (f=0x7fd28442f730, >> vmsd=0x557e0efb53c0 <vmstate_virtio_net>, opaque=0x557e12c480b0, >> vmdesc=0x7fd2840978e0, version_id=11) at ../migration/vmstate.c:330 >> #6 0x0000557e0e4f6390 in vmstate_save_state (f=0x7fd28442f730, >> vmsd=0x557e0efb53c0 <vmstate_virtio_net>, opaque=0x557e12c480b0, >> vmdesc_id=0x7fd2840978e0) at ../migration/vmstate.c:318 >> #7 0x0000557e0e51c80a in vmstate_save (f=0x7fd28442f730, >> se=0x557e12c4c430, vmdesc=0x7fd2840978e0) at >> ../migration/savevm.c:1000 >> #8 0x0000557e0e51d942 in >> qemu_savevm_state_complete_precopy_non_iterable (f=0x7fd28442f730, >> in_postcopy=false, inactivate_disks=false) at >> ../migration/savevm.c:1463 >> #9 0x0000557e0e51db33 in qemu_savevm_state_complete_precopy >> (f=0x7fd28442f730, iterable_only=false, inactivate_disks=false) at >> ../migration/savevm.c:1529 >> #10 0x0000557e0e51df3d in qemu_savevm_state (f=0x7fd28442f730, errp=0x7fd28425e1f8) at ../migration/savevm.c:1635 >> #11 0x0000557e0e520548 in save_snapshot (name=0x7fd28425e2b0 >> "snap0", overwrite=false, vmstate=0x7fd284479c20 "scsi0", >> has_devices=true, devices=0x7fd284097920, errp=0x7fd28425e1f8) at >> ../migration/savevm.c:2952 >> #12 0x0000557e0e520f9b in snapshot_save_job_bh (opaque=0x7fd28425e130) at ../migration/savevm.c:3251 Here we are aio_bh_call() ends calling snapshot_save_job_bh, and it didn't end well. It appears that there is not an easy way to warantee that snapshot code is only run on the main io thread, so my only other suggestion right now is that you check snapshot_save_job_bh() and see if it ever happens on a non-vcpu thread when you get the test to run correctly. Later, Juan. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-05-02 10:35 ` Juan Quintela @ 2023-05-02 12:49 ` Fiona Ebner 0 siblings, 0 replies; 21+ messages in thread From: Fiona Ebner @ 2023-05-02 12:49 UTC (permalink / raw) To: quintela, Kevin Wolf Cc: Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Am 02.05.23 um 12:35 schrieb Juan Quintela: > Fiona Ebner <f.ebner@proxmox.com> wrote: >> Am 02.05.23 um 12:03 schrieb Fiona Ebner: >>> Am 28.04.23 um 18:54 schrieb Juan Quintela: >>>> Kevin Wolf <kwolf@redhat.com> wrote: >>>>> Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben: >>>>>> Kevin Wolf <kwolf@redhat.com> wrote: >>>>>>>> I am perhaps a bit ingenuous here, but it is there a way to convince >>>>>>>> qemu that snapshot_save_job_bh *HAS* to run on the main thread? >>>>>>> >>>>>>> I believe we're talking about a technicality here. I asked another more >>>>>>> fundamental question that nobody has answered yet: >>>>>>> >>>>>>> Why do you think that it's ok to call bdrv_writev_vmstate() without >>>>>>> holding the BQL? >>>>>> >>>>>> I will say this function starts by bdrv_ (i.e. block layer people) and >>>>>> endes with _vmstate (i.e. migration people). >>>>>> >>>>>> To be honest, I don't know. That is why I _supposed_ you have an idea. >>>>> >>>>> My idea is that bdrv_*() can only be called when you hold the BQL, or >>>>> for BlockDriverStates in an iothread the AioContext lock. >>>>> >>>>> Apparently dropping the BQL in migration code was introduced in Paolo's >>>>> commit 9b095037527. >>>> >>>> Damn. I reviewed it, so I am as guilty as the author. >>>> 10 years later without problems I will not blame that patch. >>>> >>>> I guess we changed something else that broke doing it without the lock. >>>> >>>> But no, I still don't have suggestions/ideas. >>>> >>> >>> I do feel like the issue might be very difficult to trigger under normal >>> circumstances. Depending on the configuration and what you do in the >>> guest, aio_poll in a vCPU thread does not happen often and I imagine >>> snapshot-save is also not a super frequent operation for most people. It >>> still takes me a while to trigger the issue by issuing lots of pflash >>> writes and running snapshot-save in a loop, I'd guess about 30-60 >>> snapshots. Another reason might be that generated co-wrappers were less >>> common in the past? >>> >>>>> I'm not sure what this was supposed to improve in >>>>> the case of snapshots because the VM is stopped anyway. >>> >>> Is it? Quoting Juan:> d- snapshots are a completely different beast, >>> that don't really stop >>>> the guest in the same way at that point, and sometimes it shows in >>>> this subtle details. >>> >>>>> Would anything bad happen if we removed the BQL unlock/lock section in >>>>> qemu_savevm_state() again? >>>> >>>> Dunno. >>>> >>>> For what is worth, I can say that it survives migration-test, but don't >>>> ask me why/how/... >>>> >>>> Fiona, can you check if it fixes your troubles? >>>> >>> >>> Just removing the single section in qemu_savevm_state() breaks even the >>> case where snapshot_save_job_bh() is executed in the main thread, >>> because ram_init_bitmaps() will call qemu_mutex_lock_iothread_impl() >>> which asserts that it's not already locked. >>> >>> Also removing the lock/unlock pair in ram_init_bitmaps() seems to work. >> >> Well, after a few more attempts, I got a new failure (running with the >> two changes mentioned above), but it seems to happen later and, at a >> first glance, doesn't seem to be related to the lock anymore: > > Can you revert the whole commit: > > commmit 9b0950375277467fd74a9075624477ae43b9bb22 > Author: Paolo Bonzini <pbonzini@redhat.com> > Date: Fri Feb 22 17:36:28 2013 +0100 > > migration: run setup callbacks out of big lock Unfortunately, there were just too many changes. git revert won't even detect the renamed files (e.g. migration.c -> migration/migration.c) for me. I tried to revert manually, but I noticed another issue. A full revert would also mean that migration holds the BQL around the qemu_savevm_state_setup() call. But if we don't want that, we can't remove the qemu_mutex_lock_iothread() call in ram_init_bitmaps() (and similar for block_save_setup()), because migration relies on that. I guess we could make the lock+unlock calls conditional, depending on whether the BQL is already held, but it doesn't seem too nice either. > > Because we are again at: >> >>> Thread 21 "CPU 0/KVM" received signal SIGABRT, Aborted. >>> [Switching to Thread 0x7fd291ffb700 (LWP 136620)] >>> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >>> 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >>> (gdb) bt >>> #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >>> #1 0x00007fd2b8b3e537 in __GI_abort () at abort.c:79 >>> #2 0x00007fd2b8b3e40f in __assert_fail_base (fmt=0x7fd2b8cb66a8 >>> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", >>> assertion=0x557e0ebb7193 "!n->vhost_started", file=0x557e0ebb65f0 >>> "../hw/net/virtio-net.c", line=3811, function=<optimized out>) at >>> assert.c:92 >>> #3 0x00007fd2b8b4d662 in __GI___assert_fail >>> (assertion=0x557e0ebb7193 "!n->vhost_started", file=0x557e0ebb65f0 >>> "../hw/net/virtio-net.c", line=3811, function=0x557e0ebb7740 >>> <__PRETTY_FUNCTION__.3> "virtio_net_pre_save") at assert.c:101 >>> #4 0x0000557e0e6c21c9 in virtio_net_pre_save (opaque=0x557e12c480b0) at ../hw/net/virtio-net.c:3811 >>> #5 0x0000557e0e4f63ee in vmstate_save_state_v (f=0x7fd28442f730, >>> vmsd=0x557e0efb53c0 <vmstate_virtio_net>, opaque=0x557e12c480b0, >>> vmdesc=0x7fd2840978e0, version_id=11) at ../migration/vmstate.c:330 >>> #6 0x0000557e0e4f6390 in vmstate_save_state (f=0x7fd28442f730, >>> vmsd=0x557e0efb53c0 <vmstate_virtio_net>, opaque=0x557e12c480b0, >>> vmdesc_id=0x7fd2840978e0) at ../migration/vmstate.c:318 >>> #7 0x0000557e0e51c80a in vmstate_save (f=0x7fd28442f730, >>> se=0x557e12c4c430, vmdesc=0x7fd2840978e0) at >>> ../migration/savevm.c:1000 >>> #8 0x0000557e0e51d942 in >>> qemu_savevm_state_complete_precopy_non_iterable (f=0x7fd28442f730, >>> in_postcopy=false, inactivate_disks=false) at >>> ../migration/savevm.c:1463 >>> #9 0x0000557e0e51db33 in qemu_savevm_state_complete_precopy >>> (f=0x7fd28442f730, iterable_only=false, inactivate_disks=false) at >>> ../migration/savevm.c:1529 >>> #10 0x0000557e0e51df3d in qemu_savevm_state (f=0x7fd28442f730, errp=0x7fd28425e1f8) at ../migration/savevm.c:1635 >>> #11 0x0000557e0e520548 in save_snapshot (name=0x7fd28425e2b0 >>> "snap0", overwrite=false, vmstate=0x7fd284479c20 "scsi0", >>> has_devices=true, devices=0x7fd284097920, errp=0x7fd28425e1f8) at >>> ../migration/savevm.c:2952 >>> #12 0x0000557e0e520f9b in snapshot_save_job_bh (opaque=0x7fd28425e130) at ../migration/savevm.c:3251 > > Here we are > aio_bh_call() > > ends calling snapshot_save_job_bh, and it didn't end well. > > It appears that there is not an easy way to warantee that snapshot code > is only run on the main io thread, so my only other suggestion right now > is that you check snapshot_save_job_bh() and see if it ever happens on a > non-vcpu thread when you get the test to run correctly. Sorry, I'm not sure what you mean by "when you get the test to run correctly". Yes, the assertion from above only happened in a vCPU thread for me. While save_snapshot() calls vm_stop(RUN_STATE_SAVE_VM), I guess it's because of the following? > int vm_stop(RunState state) > { > if (qemu_in_vcpu_thread()) { > qemu_system_vmstop_request_prepare(); > qemu_system_vmstop_request(state); > /* > * FIXME: should not return to device code in case > * vm_stop() has been requested. > */ > cpu_stop_current(); > return 0; > } > > return do_vm_stop(state, true); > } Would rescheduling snapshot_save_job_bh() until it runs in the main thread be an option? But Kevin said: > My idea is that bdrv_*() can only be called when you hold the BQL, or > for BlockDriverStates in an iothread the AioContext lock So IIUC, having snapshot code keep dropping the BQL during setup would still not be fully correct even if we ensure that snapshot code only runs in the main thread. But when doing qemu_mutex_unlock_iothread() in the main thread, qemu_get_current_aio_context() still returns the main thread's AioContext (it returns 0x0 when the same thing is done in the vCPU thread), so is it fine after all? Another alternative would be to check in the QIOChannelBlock implementation if we require the main thread's AioContext and acquire the BQL before calling the bdrv_*_vmstate() functions? But that seems rather ugly too. Best Regards, Fiona ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll 2023-05-02 10:03 ` Fiona Ebner 2023-05-02 10:25 ` Fiona Ebner @ 2023-05-02 10:30 ` Juan Quintela 1 sibling, 0 replies; 21+ messages in thread From: Juan Quintela @ 2023-05-02 10:30 UTC (permalink / raw) To: Fiona Ebner Cc: Kevin Wolf, Paolo Bonzini, Markus Armbruster, QEMU Developers, open list:Block layer core, Michael Roth, Fam Zheng, Stefan Hajnoczi, Thomas Lamprecht, Peter Xu Fiona Ebner <f.ebner@proxmox.com> wrote: > Am 28.04.23 um 18:54 schrieb Juan Quintela: >> Kevin Wolf <kwolf@redhat.com> wrote: >>> Am 28.04.2023 um 10:38 hat Juan Quintela geschrieben: >>>> Kevin Wolf <kwolf@redhat.com> wrote: >>>>>> I am perhaps a bit ingenuous here, but it is there a way to convince >>>>>> qemu that snapshot_save_job_bh *HAS* to run on the main thread? >>>>> >>>>> I believe we're talking about a technicality here. I asked another more >>>>> fundamental question that nobody has answered yet: >>>>> >>>>> Why do you think that it's ok to call bdrv_writev_vmstate() without >>>>> holding the BQL? >>>> >>>> I will say this function starts by bdrv_ (i.e. block layer people) and >>>> endes with _vmstate (i.e. migration people). >>>> >>>> To be honest, I don't know. That is why I _supposed_ you have an idea. >>> >>> My idea is that bdrv_*() can only be called when you hold the BQL, or >>> for BlockDriverStates in an iothread the AioContext lock. >>> >>> Apparently dropping the BQL in migration code was introduced in Paolo's >>> commit 9b095037527. >> >> Damn. I reviewed it, so I am as guilty as the author. >> 10 years later without problems I will not blame that patch. >> >> I guess we changed something else that broke doing it without the lock. >> >> But no, I still don't have suggestions/ideas. >> > > I do feel like the issue might be very difficult to trigger under normal > circumstances. Depending on the configuration and what you do in the > guest, aio_poll in a vCPU thread does not happen often and I imagine > snapshot-save is also not a super frequent operation for most people. It > still takes me a while to trigger the issue by issuing lots of pflash > writes and running snapshot-save in a loop, I'd guess about 30-60 > snapshots. Another reason might be that generated co-wrappers were less > common in the past? May I suggest that you add a test in migration-test (or ever better create snapshot-test, hint, hint!) if you ever find a way to reproduce it. >>> I'm not sure what this was supposed to improve in >>> the case of snapshots because the VM is stopped anyway. > > Is it? Quoting Juan:> d- snapshots are a completely different beast, > that don't really stop >> the guest in the same way at that point, and sometimes it shows in >> this subtle details. Exhibit A: qemu/migration/migration.c:migration_thread() Exhibit B: qemu/migration/savevm.c:qemu_savevm_state() They both call (from a 10000feet view): qemu_savevm_state_header() qemu_savevm_state_setup() qemu_savevm_iterate() qemu_savevm_state_cleanup() But there are subtle details in the middle that are different in both places, and historically we have forgotten to add fixes to qemu_savevm_state() (principally because almost nobody uses it, compared with migration) and because on upstream we test with migration-test, several advocado tests for migration, and we do the same for Enterprise distros. As said before, snapshots are the bastard child of migration, they are there, but nobody look at them. I tried in the past to merge both code paths, but as said, the differences are tricky and subtle and I never finished that project. >>> Would anything bad happen if we removed the BQL unlock/lock section in >>> qemu_savevm_state() again? >> >> Dunno. >> >> For what is worth, I can say that it survives migration-test, but don't >> ask me why/how/... >> >> Fiona, can you check if it fixes your troubles? >> > > Just removing the single section in qemu_savevm_state() breaks even the > case where snapshot_save_job_bh() is executed in the main thread, > because ram_init_bitmaps() will call qemu_mutex_lock_iothread_impl() > which asserts that it's not already locked. > > Also removing the lock/unlock pair in ram_init_bitmaps() seems to work. > But I'm not sure what else a full semantic revert of commit 9b095037527 > would entail. I have no idea. I will let Paolo to answer this one. I can guess this was a solution to improve coroutines or rcu or whatever, but I don't remember the details. Later, Juan. ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-05-02 12:50 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-04-19 14:09 QMP (without OOB) function running in thread different from the main thread as part of aio_poll Fiona Ebner 2023-04-19 16:59 ` Paolo Bonzini 2023-04-20 6:11 ` Markus Armbruster 2023-04-20 6:55 ` Paolo Bonzini 2023-04-26 14:31 ` Fiona Ebner 2023-04-27 11:03 ` Kevin Wolf 2023-04-27 12:27 ` Fiona Ebner 2023-04-27 14:36 ` Juan Quintela 2023-04-27 14:56 ` Peter Xu 2023-04-28 7:53 ` Fiona Ebner 2023-04-28 7:23 ` Fiona Ebner 2023-04-28 7:47 ` Juan Quintela 2023-04-28 8:30 ` Kevin Wolf 2023-04-28 8:38 ` Juan Quintela 2023-04-28 12:22 ` Kevin Wolf 2023-04-28 16:54 ` Juan Quintela 2023-05-02 10:03 ` Fiona Ebner 2023-05-02 10:25 ` Fiona Ebner 2023-05-02 10:35 ` Juan Quintela 2023-05-02 12:49 ` Fiona Ebner 2023-05-02 10:30 ` Juan Quintela
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).