From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>,
Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
Stefan Weil <sw@weilnetz.de>, Fam Zheng <fam@euphon.net>,
Paolo Bonzini <pbonzini@redhat.com>,
qemu-devel@nongnu.org, quintela@redhat.com
Subject: Re: [PATCH 2/2] thread-pool: use ThreadPool from the running thread
Date: Thu, 20 Oct 2022 17:22:17 +0100 [thread overview]
Message-ID: <Y1F1uU5bAQw80mG0@work-vm> (raw)
In-Reply-To: <Y1Frq6R4DFOPWyIY@fedora>
* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Mon, Oct 03, 2022 at 10:52:33AM +0200, Emanuele Giuseppe Esposito wrote:
> >
> >
> > Am 30/09/2022 um 17:45 schrieb Kevin Wolf:
> > > Am 30.09.2022 um 14:17 hat Emanuele Giuseppe Esposito geschrieben:
> > >> Am 29/09/2022 um 17:30 schrieb Kevin Wolf:
> > >>> Am 09.06.2022 um 15:44 hat Emanuele Giuseppe Esposito geschrieben:
> > >>>> Remove usage of aio_context_acquire by always submitting work items
> > >>>> to the current thread's ThreadPool.
> > >>>>
> > >>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > >>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> > >>>
> > >>> The thread pool is used by things outside of the file-* block drivers,
> > >>> too. Even outside the block layer. Not all of these seem to submit work
> > >>> in the same thread.
> > >>>
> > >>>
> > >>> For example:
> > >>>
> > >>> postcopy_ram_listen_thread() -> qemu_loadvm_state_main() ->
> > >>> qemu_loadvm_section_start_full() -> vmstate_load() ->
> > >>> vmstate_load_state() -> spapr_nvdimm_flush_post_load(), which has:
> > >>>
> > >>> ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
> ^^^^^^^^^^^^^^^^^^^
>
> aio_get_thread_pool() isn't thread safe either:
>
> ThreadPool *aio_get_thread_pool(AioContext *ctx)
> {
> if (!ctx->thread_pool) {
> ctx->thread_pool = thread_pool_new(ctx);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Two threads could race in aio_get_thread_pool().
>
> I think post-copy is broken here: it's calling code that was only
> designed to be called from the main loop thread.
>
> I have CCed Juan and David.
In theory the path that you describe there shouldn't happen - although
there is perhaps not enough protection on the load side to stop it
happening if presented with a bad stream.
This is documented in docs/devel/migration.rst under 'Destination
behaviour'; but to recap, during postcopy load we have a problem that we
need to be able to load incoming iterative (ie. RAM) pages during the
loading of normal devices, because the loading of a device may access
RAM that's not yet been transferred.
To do that, the device state of all the non-iterative devices (which I
think includes your spapr_nvdimm) is serialised into a separate
migration stream and sent as a 'package'.
We read the package off the stream on the main thread, but don't process
it until we fire off the 'listen' thread - which you spotted the
creation of above; the listen thread now takes over reading the
migration stream to process RAM pages, and since it's in the same
format, it calls qemu_loadvm_state_main() - but it doesn't expect
any devices in that other than the RAM devices; it's just expecting RAM.
In parallel with that, the main thread carries on loading the contents
of the 'package' - and that contains your spapr_nvdimm device (and any
other 'normal' devices); but that's OK because that's the main thread.
Now if something was very broken and sent a header for the spapr-nvdimm
down the main thread rather than into the package then, yes, we'd
trigger your case, but that shouldn't happen.
Dave
> > >>> ...
> > >>> thread_pool_submit_aio(pool, flush_worker_cb, state,
> > >>> spapr_nvdimm_flush_completion_cb, state);
> > >>>
> > >>> So it seems to me that we may be submitting work for the main thread
> > >>> from a postcopy migration thread.
> > >>>
> > >>> I believe the other direct callers of thread_pool_submit_aio() all
> > >>> submit work for the main thread and also run in the main thread.
> > >>>
> > >>>
> > >>> For thread_pool_submit_co(), pr_manager_execute() calls it with the pool
> > >>> it gets passed as a parameter. This is still bdrv_get_aio_context(bs) in
> > >>> hdev_co_ioctl() and should probably be changed the same way as for the
> > >>> AIO call in file-posix, i.e. use qemu_get_current_aio_context().
> > >>>
> > >>>
> > >>> We could consider either asserting in thread_pool_submit_aio() that we
> > >>> are really in the expected thread, or like I suggested for LinuxAio drop
> > >>> the pool parameter and always get it from the current thread (obviously
> > >>> this is only possible if migration could in fact schedule the work on
> > >>> its current thread - if it schedules it on the main thread and then
> > >>> exits the migration thread (which destroys the thread pool), that
> > >>> wouldn't be good).
> > >>
> > >> Dumb question: why not extend the already-existing poll->lock to cover
> > >> also the necessary fields like pool->head that are accessed by other
> > >> threads (only case I could find with thread_pool_submit_aio is the one
> > >> you pointed above)?
> > >
> > > Other people are more familiar with this code, but I believe this could
> > > have performance implications. I seem to remember that this code is
> > > careful to avoid locking to synchronise between worker threads and the
> > > main thread.
> > >
> > > But looking at the patch again, I have actually a dumb question, too:
> > > The locking you're removing is in thread_pool_completion_bh(). As this
> > > is a BH, it's running the the ThreadPool's context either way, no matter
> > > which thread called thread_pool_submit_aio().
> > >
> > > I'm not sure what this aio_context_acquire/release pair is actually
> > > supposed to protect. Paolo's commit 1919631e6b5 introduced it. Was it
> > > just more careful than it needs to be?
> > >
> >
> > I think the goal is still to protect pool->head, but if so the
> > aiocontext lock is put in the wrong place, because as you said the bh is
> > always run in the thread pool context. Otherwise it seems to make no sense.
> >
> > On the other side, thread_pool_submit_aio could be called by other
> > threads on behalf of the main loop, which means pool->head could be
> > modified (iothread calls thread_pool_submit_aio) while being read by the
> > main loop (another worker thread schedules thread_pool_completion_bh).
> >
> > What are the performance implications? I mean, if the aiocontext lock in
> > the bh is actually useful and the bh really has to wait to take it,
> > being taken in much more places throughout the block layer won't be
> > better than extending the poll->lock I guess.
>
> thread_pool_submit_aio() is missing documentation on how it is supposed
> to be called.
>
> Taking pool->lock is conservative and fine in the short-term.
>
> In the longer term we need to clarify how thread_pool_submit_aio() is
> supposed to be used and remove locking to protect pool->head if
> possible.
>
> A bunch of the event loop APIs are thread-safe (aio_set_fd_handler(),
> qemu_schedule_bh(), etc) so it's somewhat natural to make
> thread_pool_submit_aio() thread-safe too. However, it would be nice to
> avoid synchronization and existing callers mostly call it from the same
> event loop thread that runs the BH and we can avoid locking in that
> case.
>
> Stefan
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2022-10-20 20:44 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-09 13:44 [PATCH 0/2] AioContext removal: LinuxAioState and ThreadPool Emanuele Giuseppe Esposito
2022-06-09 13:44 ` [PATCH 1/2] linux-aio: use LinuxAioState from the running thread Emanuele Giuseppe Esposito
2022-09-29 14:52 ` Kevin Wolf
2022-09-30 10:00 ` Emanuele Giuseppe Esposito
2022-09-30 15:32 ` Kevin Wolf
2022-10-03 9:18 ` Emanuele Giuseppe Esposito
2022-06-09 13:44 ` [PATCH 2/2] thread-pool: use ThreadPool " Emanuele Giuseppe Esposito
2022-09-29 15:30 ` Kevin Wolf
2022-09-30 12:17 ` Emanuele Giuseppe Esposito
2022-09-30 14:46 ` Emanuele Giuseppe Esposito
2022-09-30 15:45 ` Kevin Wolf
2022-10-03 8:52 ` Emanuele Giuseppe Esposito
2022-10-20 15:39 ` Stefan Hajnoczi
2022-10-20 16:22 ` Dr. David Alan Gilbert [this message]
2022-10-24 18:49 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1F1uU5bAQw80mG0@work-vm \
--to=dgilbert@redhat.com \
--cc=eesposit@redhat.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.