qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Fam Zheng <famz@redhat.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, jcody@redhat.com,
	Max Reitz <mreitz@redhat.com>,
	pbonzini@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end
Date: Thu, 30 Nov 2017 16:10:38 +0100	[thread overview]
Message-ID: <20171130151038.GE4039@localhost.localdomain> (raw)
In-Reply-To: <20171130143430.GA24067@lemon>

Am 30.11.2017 um 15:34 hat Fam Zheng geschrieben:
> On Thu, 11/30 11:31, Kevin Wolf wrote:
> > What you really mean is probably connected components in the graph, but
> > do we really want to manage merging and splitting object representing
> > connected components when a node is added or removed from the graph?
> > Especially when that graph change occurs in a drain callback?
> > 
> > You can also still easily introduce bugs where graph changes during a
> > drain end up in nodes not being drained, possibly drained twice, you
> > still access the next pointer of a deleted node or you accidentally
> > switch to draining a different component.
> > 
> > It's probably possible to get this right, but essentially you're just
> > switching from iterating a tree to iterating a list. You get roughly the
> > same set of problems that you have to consider as today, and getting it
> > right should be about the same difficulty.
> 
> Not quite. The essential idea is redo the drain begin/end in a correct way:
> 
> bdrv_drained_begin(bs):
>     1.a) stop all sources of new requests in the connected components, including
>         * aio_disable_external()
>         * stop QED timer
>         * stop block job
>     1.b) aio_poll() until:
>         * no request is in flight for bs
>         * no progress is made (any progress may generate new requests)
> 
> bdrv_drained_end(bs):
>     2.a) resume stopped sources of new requests
>     2.b) no need to do aio_poll() because the main loop will move on
> 
> 1.a) can be either done with either a graph recursion like now, or a list
> traversing if managed somewhere, like in this series. Or better, BlockGraph will
> be the manager. The rule is that these operations will not cause graph change,
> so it's an improvement.
> 
> 1.b) doesn't need recursion IMO, only looking at bs->in_flight and the return
> value of aio_poll() should be enough.

The trouble is with 1.b), which can cause graph changes, including nodes
being added to or removed from a connected component. This means that
you can get nodes in the component for which 1.a) hasn't been executed,
or nodes that are outside the component, but for which 1.a) has still
been executed.

You then need to compensate for that somehow, and basically execute 1.a)
for any new nodes (otherwise you have nodes in the component that could
still be issuing requests; skipping drained_end for them is not enough)
and execute 2.*) for the removed ones.

> For 2.a) if the drain begin/end callbacks are mananged in a list, it's
> easy to only call drained_end() for entries that got drained_begin()
> earlier, as shown in patch 2.

Yes, but not draining them was a bug in the first place.

I'll give you that trying to separate 1.a) and 1.b) so that we have only
a single BDRV_POLL_WHILE() after the most critical code, is an
interesting idea. But as long as 1.b) can involve graph changes, it
remains quite tricky.

And of course, it can't solve the problem with graph changes in callers
of bdrv_drained_begin/end because at least one BDRV_POLL_WHILE() will
stay there that can cause graph changes.

> > > > And finally, I don't really think that the recursion is even a problem.
> > > > The problem is with graph changes made by callbacks that drain allows to
> > > > run. With your changes, it might be a bit easier to avoid that
> > > > bdrv_drain() itself gets into trouble due to graph changes, but this
> > > > doesn't solve the problem for any (possibly indirect) callers of
> > > > bdrv_drain().
> > > 
> > > The recursion is the one big place that can be easily broken by graph changes,
> > > fixing this doesn't make the situation any worse. We could still fix the
> > > indirect callers by taking references or by introducing "ubiquitous coroutines".
> > 
> > But hiding a bug in 80% of the cases where it shows isn't enough.
> 
> I think they are separate bugs. And with the one that this series is fixing,
> others (bdrv_drain*() callers') may even not show up, because
> bdrv_drained_begin() crashed already.
> 
> > 
> > I think the only real solution is to forbid graph changes until after
> > any critical operation has completed. I haven't tried it out in
> > practice, but I suppose we could use a CoMutex around them and take it
> > in bdrv_drained_begin/end() and all other places that can get into
> > trouble with graph changes.
> 
> Yes, I agree, but that (using CoMutex around graph change) requires
> everything, especially the defer_to_main_loop_bh, runs in a coroutine
> context, which is exactly what I mean by "introducing 'ubiquitous
> coroutines'", because currently we don't have them.

Is it hard to do, though? Instead of using a BH to switch to the main
loop and outside of coroutine context, you could use aio_co_schedule()
and yield, which would leave you in the main loop, but still in
coroutine context.

Would this have any bad side effects I'm not aware of?

I'm not sure about other places, of course. We'd have to check that.

Kevin

  reply	other threads:[~2017-11-30 15:10 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-29 14:49 [Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end Fam Zheng
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 1/9] block: Remove unused bdrv_requests_pending Fam Zheng
2017-12-01 12:35   ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 2/9] aio: Add drain begin/end API to AioContext Fam Zheng
2017-11-30 16:07   ` Stefan Hajnoczi
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 3/9] blockjob: Implement AioContext drain ops Fam Zheng
2017-11-30 16:09   ` Stefan Hajnoczi
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 4/9] throttle: " Fam Zheng
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 5/9] qed: " Fam Zheng
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 6/9] block: Use aio_context_drained_begin in bdrv_set_aio_context Fam Zheng
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 7/9] block: Switch to use AIO drained begin/end API Fam Zheng
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 8/9] block: Drop old drained_{begin, end} callbacks Fam Zheng
2017-11-29 14:49 ` [Qemu-devel] [PATCH RFC 9/9] blockjob: Drop unused functions Fam Zheng
2017-11-29 17:25 ` [Qemu-devel] [PATCH RFC 0/9] block: Rewrite block drain begin/end Kevin Wolf
2017-11-30  2:03   ` Fam Zheng
2017-11-30 10:31     ` Kevin Wolf
2017-11-30 14:34       ` Fam Zheng
2017-11-30 15:10         ` Kevin Wolf [this message]
2017-11-30 16:04           ` Paolo Bonzini
2017-12-01  9:51             ` Fam Zheng
2017-12-01 12:24               ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171130151038.GE4039@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=famz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).