All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Cc: Fam Zheng <fam@euphon.net>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Hanna Reitz <hreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb()
Date: Fri, 11 Feb 2022 16:38:46 +0100	[thread overview]
Message-ID: <YgaDBkblIA6wu82p@redhat.com> (raw)
In-Reply-To: <20220208153655.1251658-6-eesposit@redhat.com>

Am 08.02.2022 um 16:36 hat Emanuele Giuseppe Esposito geschrieben:
> This test uses a callback of an I/O function (blk_aio_preadv)
> to modify the graph, using bdrv_attach_child.
> This is simply not allowed anymore. I/O cannot change the graph.

The callback of an I/O function isn't I/O, though. It is code _after_
the I/O has completed. If this doesn't work any more, it feels like this
is a bug.

I think it becomes a bit more obvious when you translate the AIO into
the equivalent coroutine function:

    void coroutine_fn foo(...)
    {
        GLOBAL_STATE_CODE();

        blk_co_preadv(...);
        detach_by_parent_aio_cb(...);
    }

This is obviously correct code that must work. Calling an I/O function
from a GS function is allowed, and of course the GS function may
continue to do GS stuff after the I/O function returned.

(Actually, I'm not sure if it really works when blk is not in the main
AioContext, but your API split patches claim that it does, so let's
assume for the moment that this is already true :-))

> Before "block/io.c: make bdrv_do_drained_begin_quiesce static
> and introduce bdrv_drained_begin_no_poll", the test would simply
> be at risk of failure, because if bdrv_replace_child_noperm()
> (called to modify the graph) would call a drain,
> then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
> that specifically asserts that we are not in a coroutine.
> 
> Now that we fixed the behavior, the drain will invoke a bh in the
> main loop, so we don't have such problem. However, this test is still
> illegal and fails because we forbid graph changes from I/O paths.
> 
> Once we add the required subtree_drains to protect
> bdrv_replace_child_noperm(), the key problem in this test is in:

Probably a question for a different patch, but why is a subtree drain
required instead of just a normal node drain? It feels like a bigger
hammer than what is needed for replacing a single child.

> acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
> /* Drain and check the expected result */
> bdrv_subtree_drained_begin(parent_b);
> 
> because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
> modifies the graph and is invoked during bdrv_subtree_drained_begin().
> The call stack is the following:
> 1. blk_aio_preadv() creates a coroutine, increments in_flight counter
> and enters the coroutine running blk_aio_read_entry()
> 2. blk_aio_read_entry() performs the read and then schedules a bh to
>    complete (blk_aio_complete)
> 3. at this point, subtree_drained_begin() kicks in and waits for all
>    in_flight requests, polling
> 4. polling allows the bh to be scheduled, so blk_aio_complete runs
> 5. blk_aio_complete *first* invokes the callback
>    (detach_by_parent_aio_cb) and then decrements the in_flight counter
> 6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
>    so both bdrv_unref_child() and bdrv_attach_child() will have
>    subtree_drains inside. And this causes a deadlock, because the
>    nested drain will wait for in_flight counter to go to zero, which
>    is only happening once the drain itself finishes.

So the problem has nothing to do with detach_by_parent_aio_cb() being
an I/O function in the sense of locking rules (which it isn't), but with
calling a drain operation while having the in_flight counter increased.

In other words, an AIO callback like detach_by_parent_aio_cb() must
never call drain - and it doesn't before your changes to
bdrv_replace_child_noperm() break it. How confident are we that this
only breaks tests and not real code?

Part of the problem is probably that drain is serving two slightly
different purposes inside the block layer (just make sure that nothing
touches the node any more) and callers outside of the block layer (make
sure that everything has completed and no more callbacks will be
called). This bug sits exactly in the difference between those two
concepts - we're waiting for more than we would have to wait for, and it
causes a deadlock in this combination.

I guess it could be fixed if BlockBackend accounted for requests that
are already completed, but their callback hasn't yet. blk_drain() would
then have to wait for those requests, but blk_root_drained_poll()
wouldn't because these requests don't affect the root node any more.

> Different story is test_detach_by_driver_cb(): in this case,
> detach_by_parent_aio_cb() does not call detach_indirect_bh(),
> but it is instead called as a bh running in the main loop by
> detach_by_driver_cb_drained_begin(), the callback for
> .drained_begin().
> 
> This test was added in 231281ab42 and part of the series
> "Drain fixes and cleanups, part 3"
> https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
> but as explained above I believe that it is not valid anymore, and
> can be discarded.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>

I feel throwing tests away just because they don't pass any more is a
bit too simple. :-)

Kevin



  parent reply	other threads:[~2022-02-11 15:39 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 15:36 [PATCH 0/6] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
2022-02-08 15:36 ` [PATCH 1/6] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
2022-02-10 14:14   ` Stefan Hajnoczi
2022-02-11 11:54   ` Kevin Wolf
2022-02-14 10:27     ` Emanuele Giuseppe Esposito
2022-02-14 11:57       ` Paolo Bonzini
2022-02-17 15:49         ` Emanuele Giuseppe Esposito
2022-02-14 12:03       ` Kevin Wolf
2022-02-08 15:36 ` [PATCH 2/6] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
2022-02-10 14:16   ` Stefan Hajnoczi
2022-02-11 12:28   ` Kevin Wolf
2022-02-08 15:36 ` [PATCH 3/6] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
2022-02-10 14:16   ` Stefan Hajnoczi
2022-02-11 12:34   ` Kevin Wolf
2022-02-14 10:37     ` Emanuele Giuseppe Esposito
2022-02-08 15:36 ` [PATCH 4/6] test-bdrv-drain.c: adapt test to the coming subtree drains Emanuele Giuseppe Esposito
2022-02-10 14:32   ` Stefan Hajnoczi
2022-02-10 16:02     ` Emanuele Giuseppe Esposito
2022-02-08 15:36 ` [PATCH 5/6] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
2022-02-10 14:33   ` Stefan Hajnoczi
2022-02-11 15:38   ` Kevin Wolf [this message]
2022-02-14 11:11     ` Emanuele Giuseppe Esposito
2022-02-14 11:42     ` Paolo Bonzini
2022-02-14 15:28       ` Kevin Wolf
2022-02-14 17:39         ` Paolo Bonzini
2022-02-08 15:36 ` [PATCH 6/6] jobs: ensure sleep in job_sleep_ns is fully performed Emanuele Giuseppe Esposito
2022-02-10 14:43   ` Stefan Hajnoczi
2022-02-10 15:02   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YgaDBkblIA6wu82p@redhat.com \
    --to=kwolf@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.