qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-block@nongnu.org, eblake@redhat.com, eesposit@redhat.com,
	pbonzini@redhat.com, vsementsov@yandex-team.ru,
	qemu-devel@nongnu.org
Subject: Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()
Date: Tue, 5 Sep 2023 18:39:27 +0200	[thread overview]
Message-ID: <ZPdZv8sQ9JmaPDk4@redhat.com> (raw)
In-Reply-To: <20230822190143.GH727224@fedora>

[-- Attachment #1: Type: text/plain, Size: 6627 bytes --]

Am 22.08.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> On Thu, Aug 17, 2023 at 02:50:04PM +0200, Kevin Wolf wrote:
> > bdrv_unref() is called by a lot of places that need to hold the graph
> > lock (it naturally happens in the context of operations that change the
> > graph). However, bdrv_unref() takes the graph writer lock internally, so
> > it can't actually be called while already holding a graph lock without
> > causing a deadlock.
> > 
> > bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> > node before closing it, and draining requires that the graph is
> > unlocked.
> > 
> > The solution is to defer deleting the node until we don't hold the lock
> > any more and draining is possible again.
> > 
> > Note that keeping images open for longer than necessary can create
> > problems, too: You can't open an image again before it is really closed
> > (if image locking didn't prevent it, it would cause corruption).
> > Reopening an image immediately happens at least during bdrv_open() and
> > bdrv_co_create().
> > 
> > In order to solve this problem, make sure to run the deferred unref in
> > bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> > again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> > 
> > The output of iotest 051 is updated because the additional polling
> > changes the order of HMP output, resulting in a new "(qemu)" prompt in
> > the test output that was previously on a separate line and filtered out.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  include/block/block-global-state.h |  1 +
> >  block.c                            |  9 +++++++++
> >  block/graph-lock.c                 | 23 ++++++++++++++++-------
> >  tests/qemu-iotests/051.pc.out      |  6 +++---
> >  4 files changed, 29 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
> > index f347199bff..e570799f85 100644
> > --- a/include/block/block-global-state.h
> > +++ b/include/block/block-global-state.h
> > @@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
> >  void bdrv_ref(BlockDriverState *bs);
> >  void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
> >  void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
> > +void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
> >  void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
> >  BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
> >                               BlockDriverState *child_bs,
> > diff --git a/block.c b/block.c
> > index 6376452768..9c4f24f4b9 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -7033,6 +7033,15 @@ void bdrv_unref(BlockDriverState *bs)
> >      }
> >  }
> >  
> > +void bdrv_schedule_unref(BlockDriverState *bs)
> 
> Please add a doc comment explaining when and why this should be used.

Ok.

> > +{
> > +    if (!bs) {
> > +        return;
> > +    }
> > +    aio_bh_schedule_oneshot(qemu_get_aio_context(),
> > +                            (QEMUBHFunc *) bdrv_unref, bs);
> > +}
> > +
> >  struct BdrvOpBlocker {
> >      Error *reason;
> >      QLIST_ENTRY(BdrvOpBlocker) list;
> > diff --git a/block/graph-lock.c b/block/graph-lock.c
> > index 5e66f01ae8..395d387651 100644
> > --- a/block/graph-lock.c
> > +++ b/block/graph-lock.c
> > @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> >  void bdrv_graph_wrunlock(void)
> >  {
> >      GLOBAL_STATE_CODE();
> > -    QEMU_LOCK_GUARD(&aio_context_list_lock);
> >      assert(qatomic_read(&has_writer));
> >  
> > +    WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> > +        /*
> > +         * No need for memory barriers, this works in pair with
> > +         * the slow path of rdlock() and both take the lock.
> > +         */
> > +        qatomic_store_release(&has_writer, 0);
> > +
> > +        /* Wake up all coroutine that are waiting to read the graph */
> 
> s/coroutine/coroutines/

I only changed the indentation, but I guess I can just fix it while I
touch it.

> > +        qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > +    }
> > +
> >      /*
> > -     * No need for memory barriers, this works in pair with
> > -     * the slow path of rdlock() and both take the lock.
> > +     * Run any BHs that were scheduled during the wrlock section and that
> > +     * callers might expect to have finished (e.g. bdrv_unref() calls). Do this
> 
> Referring directly to bdrv_schedule_unref() would help make it clearer
> what you mean.
> 
> > +     * only after restarting coroutines so that nested event loops in BHs don't
> > +     * deadlock if their condition relies on the coroutine making progress.
> >       */
> > -    qatomic_store_release(&has_writer, 0);
> > -
> > -    /* Wake up all coroutine that are waiting to read the graph */
> > -    qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > +    aio_bh_poll(qemu_get_aio_context());
> 
> Keeping a dedicated list of BDSes pending unref seems cleaner than using
> the aio_bh_poll(). There's a lot of stuff happening in the event loop
> and I wonder if those other BHs might cause issues if they run at the
> end of bdrv_graph_wrunlock(). The change to qemu-iotests 051's output is
> an example of this, but other things could happen too (e.g. monitor
> command processing).

I would argue that it's no worse than the old code: The bdrv_unref()
that we're replacing would already run a nested event loop if it was the
last reference. We only moved this a bit later, making the part of the
code that doesn't run an event loop a bit larger. The difference is that
we're running it unconditionally now, not only in a special case, but I
think that's actually an improvement (in terms of test coverage etc.)

We run nested event loops in all kinds of different places without
thinking much about it. If we're worried about it here, what do we do
about all these other places?

Anyway, if you think that we should process only bdrv_schedule_unref()
here, how would you approach this? Restrict it to bdrv_schedule_unref()
in the hope that this is the only operation we'll need to delay, or
implement another BH mechanism from scratch in graph-lock.c? In theory
we could also add another AioContext where actual BHs can be queued and
that is only polled here, but the iohandler context is already painful
enough that I don't necessarily want to create another thing like it.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-09-05 16:40 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-17 12:49 [PATCH 00/21] Graph locking part 4 (node management) Kevin Wolf
2023-08-17 12:50 ` [PATCH 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
2023-08-17 19:18   ` Eric Blake
2023-08-21  7:34   ` Emanuele Giuseppe Esposito
2023-08-22 18:23   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
2023-08-18 15:23   ` Eric Blake
2023-08-21  7:34   ` Emanuele Giuseppe Esposito
2023-08-22 18:26   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 03/21] preallocate: Don't poll during permission updates Kevin Wolf
2023-08-18 15:34   ` Eric Blake
2023-08-22 18:41   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
2023-08-18 16:07   ` Eric Blake
2023-08-21  7:34   ` Emanuele Giuseppe Esposito
2023-08-22 18:49   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
2023-08-18 16:23   ` Eric Blake
2023-08-18 16:26     ` Eric Blake
2023-08-21 16:59       ` Kevin Wolf
2023-08-22 19:01   ` Stefan Hajnoczi
2023-09-05 16:39     ` Kevin Wolf [this message]
2023-09-06  9:17       ` Kevin Wolf
2023-08-17 12:50 ` [PATCH 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions Kevin Wolf
2023-08-21  7:34   ` Emanuele Giuseppe Esposito
2023-08-22 19:03   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 07/21] block-coroutine-wrapper: Allow arbitrary parameter names Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:03   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:05   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:14   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK Kevin Wolf
2023-08-22 19:16   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 11/21] block: Call transaction callbacks with lock held Kevin Wolf
2023-08-22 19:19   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:21   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:23   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 14/21] block: Mark bdrv_get_cumulative_perm() " Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:28   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:29   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:29   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:35   ` Stefan Hajnoczi
2023-09-05 15:26     ` Kevin Wolf
2023-08-17 12:50 ` [PATCH 18/21] block: Take graph rdlock in bdrv_change_aio_context() Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-17 12:50 ` [PATCH 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:38   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:39   ` Stefan Hajnoczi
2023-08-17 12:50 ` [PATCH 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK Kevin Wolf
2023-08-21  7:35   ` Emanuele Giuseppe Esposito
2023-08-22 19:40   ` Stefan Hajnoczi
2023-08-22 18:20 ` [PATCH 00/21] Graph locking part 4 (node management) Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZPdZv8sQ9JmaPDk4@redhat.com \
    --to=kwolf@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).