From: Kevin Wolf <kwolf@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
qemu-block@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
John Snow <jsnow@redhat.com>,
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
Fam Zheng <fam@euphon.net>,
qemu-devel@nongnu.org
Subject: Re: [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock
Date: Wed, 18 May 2022 18:14:17 +0200 [thread overview]
Message-ID: <YoUbWYfl0Bft3LiU@redhat.com> (raw)
In-Reply-To: <6fc3e40e-7682-b9dc-f789-3ca95e0430db@redhat.com>
Am 18.05.2022 um 14:43 hat Paolo Bonzini geschrieben:
> On 5/18/22 14:28, Emanuele Giuseppe Esposito wrote:
> > For example, all callers of bdrv_open() always take the AioContext lock.
> > Often it is taken very high in the call stack, but it's always taken.
>
> I think it's actually not a problem of who takes the AioContext lock or
> where; the requirements are contradictory:
>
> * IO_OR_GS_CODE() functions, when called from coroutine context, expect to
> be called with the AioContext lock taken (example: bdrv_co_yield_to_drain)
>
> * to call these functions with the lock taken, the code has to run in the
> BDS's home iothread. Attempts to do otherwise results in deadlocks (the
> main loop's AIO_WAIT_WHILEs expect progress from the iothread, that cannot
> happen without releasing the aiocontext lock)
>
> * running the code in the BDS's home iothread is not possible for
> GLOBAL_STATE_CODE() functions (unless the BDS home iothread is the main
> thread, but that cannot be guaranteed in general)
>
> > We might suppose that many callbacks are called under drain and in
> > GLOBAL_STATE, which should be enough, but from our experimentation in
> > the previous series we saw that currently not everything is under drain,
> > leaving some operations unprotected (remember assert_graph_writable
> > temporarily disabled, since drain coverage for bdrv_replace_child_noperm
> > was not 100%?).
> > Therefore we need to add more drains. But isn't drain what we decided to
> > drop at the beginning? Why isn't drain good?
>
> To sum up the patch ordering deadlock that we have right now:
>
> * in some cases, graph manipulations are protected by the AioContext lock
>
> * eliminating the AioContext lock is needed to move callbacks to coroutine
> contexts (see above for the deadlock scenario)
>
> * moving callbacks to coroutine context is needed by the graph rwlock
> implementation
>
> On one hand, we cannot protect the graph across manipulations with a graph
> rwlock without removing the AioContext lock; on the other hand, the
> AioContext lock is what _right now_ protects the graph.
>
> So I'd rather go back to Emanuele's draining approach. It may not be
> beautiful, but it allows progress. Once that is in place, we can remove the
> AioContext lock (which mostly protects virtio-blk/virtio-scsi code right
> now) and reevaluate our next steps.
If we want to use drain for locking, we need to make sure that drain
actually does the job correctly. I see two major problems with it:
The first one is that drain only covers I/O paths, but we need to
protect against _anything_ touching block nodes. This might mean a
massive audit and making sure that everything in QEMU that could
possibly touch a block node is integrated with drain.
I think Emanuele has argued before that because writes to the graph only
happen in the main thread and we believe that currently only I/O
requests are processed in iothreads, this is safe and we don't actually
need to audit everything.
This is true as long as the assumption holds true (how do we ensure that
nobody ever introduces non-I/O code touching a block node in an
iothread?) and as long as the graph writer never yields or polls. I
think the latter condition is violated today, a random example is that
adjusting drain counts in bdrv_replace_child_noperm() does poll. Without
cooperation from all relevant places, the effectively locked code
section ends right there, even if the drained section continues. Even if
we can fix this, verifying that the conditions are met everywhere seems
not trivial.
And that's exactly my second major concern: Even if we manage to
correctly implement things with drain, I don't see a way to meaningfully
review it. I just don't know how to verify with some confidence that
it's actually correct and covering everything that needs to be covered.
Drain is not really a lock. But if you use it as one, the best it can
provide is advisory locking (callers, inside and outside the block
layer, need to explicitly support drain instead of having the lock
applied somewhere in the block layer functions). And even if all
relevant pieces actually make use of it, it still has an awkward
interface for locking:
/* Similar to rdlock(), but doesn't wait for writers to finish. It is
* the callers responsibility to make sure that there are no writers. */
bdrv_inc_in_flight()
/* Similar to wrlock(). Waits for readers to finish. New readers are not
* prevented from starting after it returns. Third parties are politely
* asked not to touch the block node while it is drained. */
bdrv_drained_begin()
(I think the unlock counterparts actually behave as expected from a real
lock.)
Having an actual rdlock() (that waits for writers), and using static
analysis to verify that all relevant places use it (so that wrlock()
doesn't rely on politely asking third parties to leave the node alone)
gave me some confidence that we could verify the result.
I'm not sure at all how to achieve the same with the drain interface. In
theory, it's possible. But it complicates the conditions so much that
I'm pretty much sure that the review wouldn't only be very time
consuming, but I would make mistakes during the review, rendering it
useless.
Maybe throwing some more static analysis on the code can help, not sure.
It's going to be a bit more complex than with the other approach,
though.
Kevin
next prev parent reply other threads:[~2022-05-18 16:15 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-26 8:51 [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock Emanuele Giuseppe Esposito
2022-04-26 8:51 ` [RFC PATCH v2 1/8] aio_wait_kick: add missing memory barrier Emanuele Giuseppe Esposito
2022-04-28 11:09 ` Stefan Hajnoczi
2022-04-29 8:06 ` Emanuele Giuseppe Esposito
2022-04-30 5:21 ` Stefan Hajnoczi
2022-04-29 8:12 ` Paolo Bonzini
2022-04-26 8:51 ` [RFC PATCH v2 2/8] coroutine-lock: release lock when restarting all coroutines Emanuele Giuseppe Esposito
2022-04-26 14:59 ` Paolo Bonzini
2022-04-28 11:21 ` Stefan Hajnoczi
2022-04-28 22:14 ` Paolo Bonzini
2022-04-29 9:35 ` Emanuele Giuseppe Esposito
2022-04-26 8:51 ` [RFC PATCH v2 3/8] block: introduce a lock to protect graph operations Emanuele Giuseppe Esposito
2022-04-26 15:00 ` Paolo Bonzini
2022-04-28 13:45 ` Stefan Hajnoczi
2022-04-29 8:37 ` Emanuele Giuseppe Esposito
2022-04-30 5:48 ` Stefan Hajnoczi
2022-05-02 7:54 ` Emanuele Giuseppe Esposito
2022-05-03 10:50 ` Stefan Hajnoczi
2022-04-26 8:51 ` [RFC PATCH v2 4/8] async: register/unregister aiocontext in graph lock list Emanuele Giuseppe Esposito
2022-04-28 13:46 ` Stefan Hajnoczi
2022-04-28 22:19 ` Paolo Bonzini
2022-04-29 8:37 ` Emanuele Giuseppe Esposito
2022-04-26 8:51 ` [RFC PATCH v2 5/8] block.c: wrlock in bdrv_replace_child_noperm Emanuele Giuseppe Esposito
2022-04-26 15:07 ` Paolo Bonzini
2022-04-28 13:55 ` Stefan Hajnoczi
2022-04-29 8:41 ` Emanuele Giuseppe Esposito
2022-04-26 8:51 ` [RFC PATCH v2 6/8] block: assert that graph read and writes are performed correctly Emanuele Giuseppe Esposito
2022-04-28 14:43 ` Stefan Hajnoczi
2022-04-26 8:51 ` [RFC PATCH v2 7/8] graph-lock: implement WITH_GRAPH_RDLOCK_GUARD and GRAPH_RDLOCK_GUARD macros Emanuele Giuseppe Esposito
2022-04-28 15:00 ` Stefan Hajnoczi
2022-04-26 8:51 ` [RFC PATCH v2 8/8] mirror: protect drains in coroutine with rdlock Emanuele Giuseppe Esposito
2022-04-27 6:55 ` [RFC PATCH v2 0/8] Removal of AioContext lock, bs->parents and ->children: new rwlock Emanuele Giuseppe Esposito
2022-04-28 10:45 ` Stefan Hajnoczi
2022-04-28 21:56 ` Emanuele Giuseppe Esposito
2022-04-30 5:17 ` Stefan Hajnoczi
2022-05-02 8:02 ` Emanuele Giuseppe Esposito
2022-05-02 13:15 ` Paolo Bonzini
2022-05-03 8:24 ` Kevin Wolf
2022-05-03 11:04 ` Stefan Hajnoczi
2022-04-28 10:34 ` Stefan Hajnoczi
2022-04-29 8:06 ` Emanuele Giuseppe Esposito
2022-05-04 13:39 ` Stefan Hajnoczi
2022-05-17 10:59 ` Stefan Hajnoczi
2022-05-18 12:28 ` Emanuele Giuseppe Esposito
2022-05-18 12:43 ` Paolo Bonzini
2022-05-18 14:57 ` Stefan Hajnoczi
2022-05-18 16:14 ` Kevin Wolf [this message]
2022-05-19 11:27 ` Stefan Hajnoczi
2022-05-19 12:52 ` Kevin Wolf
2022-05-22 15:06 ` Stefan Hajnoczi
2022-05-23 8:48 ` Emanuele Giuseppe Esposito
2022-05-23 13:15 ` Stefan Hajnoczi
2022-05-23 13:54 ` Emanuele Giuseppe Esposito
2022-05-23 13:02 ` Kevin Wolf
2022-05-23 15:13 ` Stefan Hajnoczi
2022-05-23 16:04 ` Kevin Wolf
2022-05-23 16:45 ` Stefan Hajnoczi
2022-05-24 7:55 ` Paolo Bonzini
2022-05-24 8:08 ` Stefan Hajnoczi
2022-05-24 9:17 ` Paolo Bonzini
2022-05-24 10:20 ` Stefan Hajnoczi
2022-05-24 17:25 ` Paolo Bonzini
2022-05-24 10:36 ` Kevin Wolf
2022-05-25 7:41 ` Paolo Bonzini
2022-05-18 14:27 ` Stefan Hajnoczi
2022-05-24 12:10 ` Kevin Wolf
2022-05-25 8:27 ` Emanuele Giuseppe Esposito
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YoUbWYfl0Bft3LiU@redhat.com \
--to=kwolf@redhat.com \
--cc=eesposit@redhat.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jsnow@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).