From: Fam Zheng <famz@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: kwolf@redhat.com, qemu-block@nongnu.org, jcody@redhat.com,
qemu-devel@nongnu.org, mreitz@redhat.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] Block layer complexity: what to do to keep it under control?
Date: Thu, 30 Nov 2017 17:47:09 +0800 [thread overview]
Message-ID: <20171130094709.GA20286@lemon> (raw)
In-Reply-To: <20171129120018.GB2601@stefanha-x1.localdomain>
On Wed, 11/29 12:00, Stefan Hajnoczi wrote:
> On Wed, Nov 29, 2017 at 11:55:02AM +0800, Fam Zheng wrote:
> > As we move forwards with new features in the block layer, the chances of tricky
> > bugs happening have been increasing alongside - block jobs, coroutines,
> > throttling, AioContext, op blockers and image locking combined together make a
> > large and complex picture that is hard to fully understand and work with. Some
> > bugs we've encountered are quite challenging already. Examples are:
> >
> > - segfault in parallel blockjobs (iotest 30)
> > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html
> >
> > - Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage
> > migration)
> > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01626.html
> >
> > - Drainage in bdrv_replace_child_noperm()
> > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg00868.html
> >
> > - Regression from 2.8: stuck in bdrv_drain()
> > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02193.html
> >
> > So in principle, what should we do to make the block layer easy to understand,
> > develop with and debug?
>
> The assumptions that the code relies on are unclear so it's easy to
> introduce new bugs.
Is that one thing we could do better in documenting?
>
> We are at a point where code review isn't finding certain bugs because
> no single person knows all the assumptions. Previously the problem was
> contained because maintainers spotted problems before patches were
> merged.
>
> This is not primarily a documentation problem though. We cannot
> document our way out of this because no single person (patch author or
> code reviewer) can know or check everything anymore due to the scale.
>
> I think it's a (lack of) design problem because we have many incomplete
> abstractions like block jobs, IOThreads, block graph, image locking,
> etc. They do not cover all possibly states and interactions today.
> Extending them leads to complex bugs.
>
> A little progress has been made with defining higher-level APIs for
> block drivers and block jobs. This way they either don't deal with
> low-level details of the concurrency and event loop models (e.g.
> bdrv_coroutine_enter()) or there is an interface that prompts them to
> integrate properly like bdrv_attach/detach_aio_context().
Sounds good.
>
> Event loops and coroutines are good but they should not be used directly
> by block drivers and block jobs. We need safe, high-level APIs that
> implement commonly-used operations.
>
> > - Documentation
> >
> > There is no central developer doc about block layer, especially how all pieces
> > fit together. Having one will make it a lot easier for new contributors to
> > understand better. Of course, we're facing the old problem: the code is
> > moving, maintaining an updated document needs effort.
> >
> > Idea: add ./doc/deve/block.txt?
>
> IOThreads and AioContexts are addressed here:
> docs/devel/multiple-iothreads.txt
>
> The game has become significantly more complex than what the document
> describes. It's lacking aio_co_wake() and aio_co_schedule() for
> example.
>
> > - Simplified code, or more orthogonal/modularized architecture.
> >
> > Each aspect of block layer is complex enough so isolating them as much as
> > possible is a reasonable approach to control the complexity. Block jobs and
> > throttling becoming block filters is a good example, we should identify more.
> >
> > Idea: rethink event loops. Create coroutines ubiquitously (for example for
> > each fd handler, BH and timer), so that many nested aio_poll() can be removed.
> >
> > Crazy idea: move the whole block layer to a vhost process, and implement
> > existing features differently, especially in terms of multi-threading (hint:
> > rust?).
>
> A reimplementation will not solve the problem because:
>
> 1. If it still has the same feature set and requirements then the level
> of complexity will be comparable.
>
> 2. We can reduce accidental (inessential) complexity by continuing the
> various efforts around the block graph, block jobs, multi-queue block
> layer with an eye towards higher level APIs.
Starting over is certainly not the motivation to do qemu-vhost, but it would be
an opportunity to use different async/concurrency paradigms if that is going to
happen. I think in current block layer, event loop + coroutine is a good
combination, but having nested aio_poll()'s made it worse, then mixing IOThreads
in makes it a lot more complicated.
Fam
next prev parent reply other threads:[~2017-11-30 9:47 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-29 3:55 [Qemu-devel] Block layer complexity: what to do to keep it under control? Fam Zheng
2017-11-29 6:30 ` Jeff Cody
2017-11-29 12:16 ` Stefan Hajnoczi
2017-11-29 12:22 ` Paolo Bonzini
2017-11-29 12:00 ` Stefan Hajnoczi
2017-11-29 12:24 ` Paolo Bonzini
2017-11-29 13:24 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-29 13:41 ` [Qemu-devel] " Kevin Wolf
2017-11-29 19:58 ` Dr. David Alan Gilbert
2017-11-30 9:47 ` Fam Zheng [this message]
2017-11-30 14:19 ` Stefan Hajnoczi
2017-12-01 10:16 ` Fam Zheng
2017-12-01 14:08 ` Stefan Hajnoczi
2017-12-01 15:00 ` Fam Zheng
2017-12-01 17:03 ` Paolo Bonzini
2017-12-01 19:03 ` Peter Maydell
2017-12-04 10:41 ` Stefan Hajnoczi
2017-12-01 19:27 ` Eric Blake
2017-12-04 10:16 ` Stefan Hajnoczi
2017-12-04 10:32 ` Peter Maydell
2017-11-29 12:32 ` Daniel P. Berrange
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171130094709.GA20286@lemon \
--to=famz@redhat.com \
--cc=jcody@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.