From: Kevin Wolf <kwolf@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: Fiona Ebner <f.ebner@proxmox.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
QEMU Developers <qemu-devel@nongnu.org>,
"open list:Block layer core" <qemu-block@nongnu.org>,
Michael Roth <michael.roth@amd.com>, Fam Zheng <fam@euphon.net>,
Stefan Hajnoczi <stefanha@redhat.com>,
Thomas Lamprecht <t.lamprecht@proxmox.com>,
Peter Xu <peterx@redhat.com>
Subject: Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll
Date: Fri, 28 Apr 2023 10:30:26 +0200 [thread overview]
Message-ID: <ZEuEIhe86udi38kx@redhat.com> (raw)
In-Reply-To: <87bkj8bg8g.fsf@secure.mitica>
Am 28.04.2023 um 09:47 hat Juan Quintela geschrieben:
> Fiona Ebner <f.ebner@proxmox.com> wrote:
> > Am 27.04.23 um 16:36 schrieb Juan Quintela:
> >> Fiona Ebner <f.ebner@proxmox.com> wrote:
> >>> Am 27.04.23 um 13:03 schrieb Kevin Wolf:
> >>>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben:
> >>>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini:
> >>
> >> Hi
> >>
> >>> Our function is a custom variant of saving a snapshot and uses
> >>> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread()
> >>> is there. I looked for inspiration for how upstream does things and it
> >>> turns out that upstream QEMU v8.0.0 has essentially the same issue with
> >>> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead
> >>> of the main thread, the situation is the same: after
> >>> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return
> >>> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails
> >>> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0].
> >>>
> >>>
> >>> So all bottom halves scheduled for the main thread's AioContext can
> >>> potentially get to run in a vCPU thread and need to be very careful with
> >>> things like qemu_mutex_unlock_iothread.
> >>>
> >>> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't
> >>> looked into why it happens yet. Does there need to be a way to drop the
> >>> BQL without also giving up the main thread's AioContext or would it be
> >>> enough to re-acquire the context?
> >>>
> >>> CC-ing Juan as the migration maintainer.
> >>
> >> This is the world backwards.
> >> The tradition is that migration people blame block layer people for
> >> breaking things and for help, not the other way around O:-)
> >
> > Sorry, if I didn't provide enough context/explanation. See below for my
> > attempt to re-iterate. I CC'ed you, because the issue happens as part of
> > snapshot-save and in particular the qemu_mutex_unlock_iothread call in
> > qemu_savevm_state is one of the ingredients leading to the problem.
>
> This was a joke O:-)
>
> >> To see that I am understading this right:
> >>
> >> - you create a thread
> >> - that calls a memory_region operation
> >> - that calls a device write function
> >> - that calls the block layer
> >> - that creates a snapshot
> >> - that calls the migration code
> >> - that calls the block layer again
> >>
> >> Without further investigation, I have no clue what is going on here,
> >> sorry.
> >>
> >> Later, Juan.
> >>
> >
> > All I'm doing is using QEMU (a build of upstream's v8.0.0) in intended
> > ways, I promise! In particular, I'm doing two things at the same time
> > repeatedly:
> > 1. Write to a pflash drive from within the guest.
> > 2. Issue a snapshot-save QMP command (in a way that doesn't lead to an
> > early error).
> >
> > (I actually also used a debugger to break on pflash_update and
> > snapshot_save_job_bh, manually continuing until I triggered the
> > problematic situation. It's very racy, because it depends on the host OS
> > to switch threads at the correct time.)
>
> I think the block layer is the problem (famous last words)
>
> >
> > Now we need to be aware of two things:
> > 1. As discussed earlier in the mail thread, if the host OS switches
> > threads at an inconvenient time, it can happen that a bottom half
> > scheduled for the main thread's AioContext can be executed as part of a
> > vCPU thread's aio_poll.
> > 2. Generated coroutine wrappers for block layer functions spawn the
> > coroutine and use AIO_WAIT_WHILE/aio_poll to wait for it to finish.
> >
> > What happens in the backtrace above is:
> > 1. The write to the pflash drive uses blk_pwrite which leads to an
> > aio_poll in the vCPU thread.
> > 2. The snapshot_save_job_bh bottom half, that was scheduled for the main
> > thread's AioContext, is executed as part of the vCPU thread's aio_poll.
> > 3. qemu_savevm_state is called.
> > 4. qemu_mutex_unlock_iothread is called. Now
> > qemu_get_current_aio_context returns 0x0. Usually, snapshot_save_job_bh
> > runs in the main thread, in which case qemu_get_current_aio_context
> > still returns the main thread's AioContext at this point.
>
> I am perhaps a bit ingenuous here, but it is there a way to convince
> qemu that snapshot_save_job_bh *HAS* to run on the main thread?
I believe we're talking about a technicality here. I asked another more
fundamental question that nobody has answered yet:
Why do you think that it's ok to call bdrv_writev_vmstate() without
holding the BQL?
Because if we come to the conclusion that it's not ok (which is what I
think), then it doesn't matter whether we violate the condition in the
main thread or a vcpu thread. It's wrong in both cases, just the failure
mode differs - one crashes spectacularly with an assertion failure, the
other has a race condition. Moving from the assertion failure to a race
condition is not a proper fix.
> > 5. bdrv_writev_vmstate is executed as part of the usual savevm setup.
> > 6. bdrv_writev_vmstate is a generated coroutine wrapper, so it uses
> > AIO_WAIT_WHILE.
> > 7. The assertion to have the main thread's AioContext inside the
> > AIO_WAIT_WHILE macro fails.
Kevin
next prev parent reply other threads:[~2023-04-28 8:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-19 14:09 QMP (without OOB) function running in thread different from the main thread as part of aio_poll Fiona Ebner
2023-04-19 16:59 ` Paolo Bonzini
2023-04-20 6:11 ` Markus Armbruster
2023-04-20 6:55 ` Paolo Bonzini
2023-04-26 14:31 ` Fiona Ebner
2023-04-27 11:03 ` Kevin Wolf
2023-04-27 12:27 ` Fiona Ebner
2023-04-27 14:36 ` Juan Quintela
2023-04-27 14:56 ` Peter Xu
2023-04-28 7:53 ` Fiona Ebner
2023-04-28 7:23 ` Fiona Ebner
2023-04-28 7:47 ` Juan Quintela
2023-04-28 8:30 ` Kevin Wolf [this message]
2023-04-28 8:38 ` Juan Quintela
2023-04-28 12:22 ` Kevin Wolf
2023-04-28 16:54 ` Juan Quintela
2023-05-02 10:03 ` Fiona Ebner
2023-05-02 10:25 ` Fiona Ebner
2023-05-02 10:35 ` Juan Quintela
2023-05-02 12:49 ` Fiona Ebner
2023-05-02 10:30 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZEuEIhe86udi38kx@redhat.com \
--to=kwolf@redhat.com \
--cc=armbru@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=fam@euphon.net \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).