From: Kevin Wolf <kwolf@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-block@nongnu.org, mreitz@redhat.com, famz@redhat.com,
stefanha@redhat.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain
Date: Thu, 12 Apr 2018 13:11:43 +0200 [thread overview]
Message-ID: <20180412111143.GB5004@localhost.localdomain> (raw)
In-Reply-To: <f30ff8e9-4273-c01b-ae67-5784d5914c7a@redhat.com>
Am 12.04.2018 um 12:12 hat Paolo Bonzini geschrieben:
> On 12/04/2018 11:51, Kevin Wolf wrote:
> > Am 12.04.2018 um 10:37 hat Paolo Bonzini geschrieben:
> >> On 11/04/2018 18:39, Kevin Wolf wrote:
> >>> +bool bdrv_drain_poll(BlockDriverState *bs, bool top_level)
> >>> {
> >>> /* Execute pending BHs first and check everything else only after the BHs
> >>> * have executed. */
> >>> - while (aio_poll(bs->aio_context, false));
> >>> + if (top_level) {
> >>> + while (aio_poll(bs->aio_context, false));
> >>> + }
> >>> +
> >>> + if (bdrv_parent_drained_poll(bs)) {
> >>> + return true;
> >>> + }
> >>> +
> >>> return atomic_read(&bs->in_flight);
> >>> }
> >>>
> >>
> >> Up until now I liked very much this series, but I like this patch a bit
> >> less for two reasons.
> >>
> >> 1) I think I would prefer to have the !top_level case in a separate
> >> function---making the API easier to use in the BdrvChildRole callback
> >> because there is no need to pass false.
> >
> > Basically just move the aio_poll() out to a different function that
> > calls bdrv_drain_poll afterwards? Maybe it's a bit cleaner, yes.
> > However, see below.
>
> Yes.
>
> >> In addition, the callback is not really polling anything, but rather
> >> returning whether the children are quiescent. So a better name would
> >> be bdrv_children_drained and c->role->drained.
> >
> > Why isn't it polling? We're actively checking the state of the node, and
> > we keep calling the callback until it has the expected state. Would it
> > only become polling for you if the loop were in this function rather
> > than the its caller?
>
> It's just checking the status, it's not invoking the event loop. The
> polling (in the sense of aio_poll or AIO_POLL_WHILE) is done elsewhere,
> this function is just the condition. It's just nomenclature I guess.
Yes, it doesn't poll the AioContext events. I don't think that the name
implies that, though. It does poll the state of the block node and its
users.
> >> 2) Worse, the main idea behind the first drain restructuring was that
> >> draining could proceed in topological order: first drain the roots' I/O,
> >> then call bdrv_drain to send the last requests to their children, then
> >> recurse. It is not clear to me why you need to introduce this recursive
> >> step, which is also O(n^2) in the worst case.
> >
> > I need to introduce it because it fixes the bug that we don't wait until
> > the parents are actually quiesced and don't send new requests any more.
> > I don't see how this could be fixed without going to the parents.
>
> Yes, you do need to go to the parents. I don't understand however why
> you need more than a walk of the graph in parent-before-child order
> (which is a topological order, so it is reverse depth-first order and
> it's easy to do the walk in a recursive function). If you're draining X
> below:
>
> A
> |
> B C
> \ /
> X
>
> then you start by draining A/B/C in topological order (so A before B).
> If B happens to be already quiescent, you can skip not only B but A too.
Are we talking only about bdrv_drain_poll() here or the complete
bdrv_do_drained_begin() call?
If you don't want to recurse separately for .drain_begin and the polling
phase, but keep everything in one function like it's before this series,
you can't skip anything just because it's already quiescent. You don't
have control over that other drain and it might end too early; also,
.drain_begin/end must come in pairs.
If we're only talking about .drain_poll, then in theory you could skip A
if B knows that it's fully quiesced (i.e. not only .drain_begin was
called, but also .drain_poll has returned false at least once). We don't
store this information anywhere, though, and I'm not sure if the saved
recursion would be worth additional state.
> If the nodes disappear or move elsewhere in the graph it's okay, you've
> just done useless work.
Not really. The node is drained now and bdrv_do_drained_end() won't end
the drained section because the moved node isn't reachable any more from
this original node.
> When you're done you ensure that every parent
> is quiescent, and if so you're done. If it's not, , a new parent
> appeared---drain that too, using the same parent-before-child order, and
> loop.
That might work if we only ever were in a single drained section and
BlockDriverState had a bool quiesced. What we do have is a
quiesce_count, and when you look at the quiesce_count of a node, you
can't tell whether it came from you or someone else. So there is no way
to tell whether a parent was already drained by you or not.
And to be honest, even if it worked, it would still seem kind of
complicated to me. Just calling .drain_begin everywhere first and then
polling until nobody has anything in flight any more feels much simpler
conceptually than keeping track of which nodes we drained and repeating
that until no new nodes appear, and somehow undraining the nodes that
disappeared, but are potentially still referenced from somewhere else.
> Well, there is one gotcha: bdrv_ref protects against disappearance, but
> bdrv_ref/bdrv_unref are not thread-safe. Am I missing something else?
Apart from the above, if we do an extra bdrv_ref/unref we'd also have
to keep track of all the nodes that we've referenced so that we unref
the same nodes again, even if the graph has changes.
So essentially you'd be introducing a new list of BDSes that we have to
manage and then check for every reachable node whether it's already in
that list or not, and for every node in the list whether it's still
reachable. I don't really see how that's going to reduce the complexity.
> > Is the O(n²) that you mean that we recursively iterate all children in
> > bdrv_do_drained_begin() (or later in the series in bdrv_drain_poll()),
> > and then come back from the children when they poll their parents?
>
> Yes.
>
> Paolo
>
> > We could do the same thing as for bdrv_parent_drained_begin(), i.e. pass
> > the parent that we came from (BdrvChild *parent instead of bool
> > top_level, NULL instead of top_level=true) and then skip that parent
> > while calling the BdrvChildRole .drain_poll callbacks. Would that
> > address your concerns?
Ok, then this should be the solution. I can implement it for v2.
Kevin
next prev parent reply other threads:[~2018-04-12 11:12 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-11 16:39 [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 01/19] test-bdrv-drain: bdrv_drain() works with cross-AioContext events Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 02/19] block: Use bdrv_do_drain_begin/end in bdrv_drain_all() Kevin Wolf
2018-04-20 7:07 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 03/19] block: Remove 'recursive' parameter from bdrv_drain_invoke() Kevin Wolf
2018-04-20 7:09 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 04/19] block: Don't manually poll in bdrv_drain_all() Kevin Wolf
2018-04-11 18:32 ` Eric Blake
2018-04-20 7:11 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 05/19] tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now Kevin Wolf
2018-04-11 18:33 ` Eric Blake
2018-04-20 7:12 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 06/19] block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() Kevin Wolf
2018-04-11 17:33 ` Su Hang
2018-04-20 7:17 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 07/19] block: Really pause block jobs on drain Kevin Wolf
2018-04-12 8:37 ` Paolo Bonzini
2018-04-12 9:51 ` Kevin Wolf
2018-04-12 10:12 ` Paolo Bonzini
2018-04-12 11:11 ` Kevin Wolf [this message]
2018-04-12 11:30 ` Paolo Bonzini
2018-04-12 11:53 ` Kevin Wolf
2018-04-12 12:02 ` Paolo Bonzini
2018-04-12 13:27 ` Kevin Wolf
2018-04-12 13:42 ` Paolo Bonzini
2018-04-12 14:25 ` Kevin Wolf
2018-04-12 20:44 ` Paolo Bonzini
2018-04-13 8:01 ` Kevin Wolf
2018-04-13 11:05 ` [Qemu-devel] [Qemu-block] " Paolo Bonzini
2018-04-13 12:40 ` Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 08/19] block: Remove bdrv_drain_recurse() Kevin Wolf
2018-04-12 8:39 ` Paolo Bonzini
2018-04-20 7:20 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 09/19] test-bdrv-drain: Add test for node deletion Kevin Wolf
2018-04-20 7:32 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-04-11 16:39 ` [Qemu-devel] [PATCH 10/19] block: Drain recursively with a single BDRV_POLL_WHILE() Kevin Wolf
2018-04-12 8:41 ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 11/19] test-bdrv-drain: Test node deletion in subtree recursion Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 12/19] block: Don't poll in parent drain callbacks Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 13/19] test-bdrv-drain: Graph change through parent callback Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 14/19] block: Defer .bdrv_drain_begin callback to polling phase Kevin Wolf
2018-06-27 14:30 ` Max Reitz
2018-06-29 15:14 ` Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 15/19] test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 16/19] block: Allow AIO_WAIT_WHILE with NULL ctx Kevin Wolf
2018-04-12 8:43 ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 17/19] block: Move bdrv_drain_all_begin() out of coroutine context Kevin Wolf
2018-04-11 16:39 ` [Qemu-devel] [PATCH 18/19] block: Allow graph changes in bdrv_drain_all_begin/end sections Kevin Wolf
2018-04-12 8:47 ` Paolo Bonzini
2018-04-11 16:39 ` [Qemu-devel] [PATCH 19/19] test-bdrv-drain: Test graph changes in drain_all section Kevin Wolf
2018-04-11 17:05 ` [Qemu-devel] [PATCH 00/19] Drain fixes and cleanups, part 3 no-reply
2018-04-20 7:35 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180412111143.GB5004@localhost.localdomain \
--to=kwolf@redhat.com \
--cc=famz@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).