From: Kevin Wolf <kwolf@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: Fiona Ebner <f.ebner@proxmox.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
QEMU Developers <qemu-devel@nongnu.org>,
"open list:Block layer core" <qemu-block@nongnu.org>,
Michael Roth <michael.roth@amd.com>, Fam Zheng <fam@euphon.net>,
Stefan Hajnoczi <stefanha@redhat.com>,
Thomas Lamprecht <t.lamprecht@proxmox.com>,
Peter Xu <peterx@redhat.com>
Subject: Re: QMP (without OOB) function running in thread different from the main thread as part of aio_poll
Date: Fri, 28 Apr 2023 10:30:26 +0200 [thread overview]
Message-ID: <ZEuEIhe86udi38kx@redhat.com> (raw)
In-Reply-To: <87bkj8bg8g.fsf@secure.mitica>
Am 28.04.2023 um 09:47 hat Juan Quintela geschrieben:
> Fiona Ebner <f.ebner@proxmox.com> wrote:
> > Am 27.04.23 um 16:36 schrieb Juan Quintela:
> >> Fiona Ebner <f.ebner@proxmox.com> wrote:
> >>> Am 27.04.23 um 13:03 schrieb Kevin Wolf:
> >>>> Am 26.04.2023 um 16:31 hat Fiona Ebner geschrieben:
> >>>>> Am 20.04.23 um 08:55 schrieb Paolo Bonzini:
> >>
> >> Hi
> >>
> >>> Our function is a custom variant of saving a snapshot and uses
> >>> qemu_savevm_state_setup(), which is why the qemu_mutex_unlock_iothread()
> >>> is there. I looked for inspiration for how upstream does things and it
> >>> turns out that upstream QEMU v8.0.0 has essentially the same issue with
> >>> snapshot-save. When snapshot_save_job_bh runs in a vCPU thread instead
> >>> of the main thread, the situation is the same: after
> >>> qemu_mutex_unlock_iothread(), qemu_get_current_aio_context() will return
> >>> 0x0 and then the assertion in the AIO_WAIT_WHILE_INTERNAL macro fails
> >>> (this time the generated coroutine wrapper is bdrv_writev_vmstate)[0].
> >>>
> >>>
> >>> So all bottom halves scheduled for the main thread's AioContext can
> >>> potentially get to run in a vCPU thread and need to be very careful with
> >>> things like qemu_mutex_unlock_iothread.
> >>>
> >>> Is qemu_get_current_aio_context() returning 0x0 expected? I haven't
> >>> looked into why it happens yet. Does there need to be a way to drop the
> >>> BQL without also giving up the main thread's AioContext or would it be
> >>> enough to re-acquire the context?
> >>>
> >>> CC-ing Juan as the migration maintainer.
> >>
> >> This is the world backwards.
> >> The tradition is that migration people blame block layer people for
> >> breaking things and for help, not the other way around O:-)
> >
> > Sorry, if I didn't provide enough context/explanation. See below for my
> > attempt to re-iterate. I CC'ed you, because the issue happens as part of
> > snapshot-save and in particular the qemu_mutex_unlock_iothread call in
> > qemu_savevm_state is one of the ingredients leading to the problem.
>
> This was a joke O:-)
>
> >> To see that I am understading this right:
> >>
> >> - you create a thread
> >> - that calls a memory_region operation
> >> - that calls a device write function
> >> - that calls the block layer
> >> - that creates a snapshot
> >> - that calls the migration code
> >> - that calls the block layer again
> >>
> >> Without further investigation, I have no clue what is going on here,
> >> sorry.
> >>
> >> Later, Juan.
> >>
> >
> > All I'm doing is using QEMU (a build of upstream's v8.0.0) in intended
> > ways, I promise! In particular, I'm doing two things at the same time
> > repeatedly:
> > 1. Write to a pflash drive from within the guest.
> > 2. Issue a snapshot-save QMP command (in a way that doesn't lead to an
> > early error).
> >
> > (I actually also used a debugger to break on pflash_update and
> > snapshot_save_job_bh, manually continuing until I triggered the
> > problematic situation. It's very racy, because it depends on the host OS
> > to switch threads at the correct time.)
>
> I think the block layer is the problem (famous last words)
>
> >
> > Now we need to be aware of two things:
> > 1. As discussed earlier in the mail thread, if the host OS switches
> > threads at an inconvenient time, it can happen that a bottom half
> > scheduled for the main thread's AioContext can be executed as part of a
> > vCPU thread's aio_poll.
> > 2. Generated coroutine wrappers for block layer functions spawn the
> > coroutine and use AIO_WAIT_WHILE/aio_poll to wait for it to finish.
> >
> > What happens in the backtrace above is:
> > 1. The write to the pflash drive uses blk_pwrite which leads to an
> > aio_poll in the vCPU thread.
> > 2. The snapshot_save_job_bh bottom half, that was scheduled for the main
> > thread's AioContext, is executed as part of the vCPU thread's aio_poll.
> > 3. qemu_savevm_state is called.
> > 4. qemu_mutex_unlock_iothread is called. Now
> > qemu_get_current_aio_context returns 0x0. Usually, snapshot_save_job_bh
> > runs in the main thread, in which case qemu_get_current_aio_context
> > still returns the main thread's AioContext at this point.
>
> I am perhaps a bit ingenuous here, but it is there a way to convince
> qemu that snapshot_save_job_bh *HAS* to run on the main thread?
I believe we're talking about a technicality here. I asked another more
fundamental question that nobody has answered yet:
Why do you think that it's ok to call bdrv_writev_vmstate() without
holding the BQL?
Because if we come to the conclusion that it's not ok (which is what I
think), then it doesn't matter whether we violate the condition in the
main thread or a vcpu thread. It's wrong in both cases, just the failure
mode differs - one crashes spectacularly with an assertion failure, the
other has a race condition. Moving from the assertion failure to a race
condition is not a proper fix.
> > 5. bdrv_writev_vmstate is executed as part of the usual savevm setup.
> > 6. bdrv_writev_vmstate is a generated coroutine wrapper, so it uses
> > AIO_WAIT_WHILE.
> > 7. The assertion to have the main thread's AioContext inside the
> > AIO_WAIT_WHILE macro fails.
Kevin
next prev parent reply other threads:[~2023-04-28 8:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-19 14:09 QMP (without OOB) function running in thread different from the main thread as part of aio_poll Fiona Ebner
2023-04-19 16:59 ` Paolo Bonzini
2023-04-20 6:11 ` Markus Armbruster
2023-04-20 6:55 ` Paolo Bonzini
2023-04-26 14:31 ` Fiona Ebner
2023-04-27 11:03 ` Kevin Wolf
2023-04-27 12:27 ` Fiona Ebner
2023-04-27 14:36 ` Juan Quintela
2023-04-27 14:56 ` Peter Xu
2023-04-28 7:53 ` Fiona Ebner
2023-04-28 7:23 ` Fiona Ebner
2023-04-28 7:47 ` Juan Quintela
2023-04-28 8:30 ` Kevin Wolf [this message]
2023-04-28 8:38 ` Juan Quintela
2023-04-28 12:22 ` Kevin Wolf
2023-04-28 16:54 ` Juan Quintela
2023-05-02 10:03 ` Fiona Ebner
2023-05-02 10:25 ` Fiona Ebner
2023-05-02 10:35 ` Juan Quintela
2023-05-02 12:49 ` Fiona Ebner
2023-05-02 10:30 ` Juan Quintela
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZEuEIhe86udi38kx@redhat.com \
--to=kwolf@redhat.com \
--cc=armbru@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=fam@euphon.net \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=t.lamprecht@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.