Re: [Qemu-devel] Block layer complexity: what to do to keep it under control?

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org, jcody@redhat.com,
	mreitz@redhat.com, eblake@redhat.com
Subject: Re: [Qemu-devel] Block layer complexity: what to do to keep it under control?
Date: Wed, 29 Nov 2017 14:41:30 +0100	[thread overview]
Message-ID: <20171129134130.GC3753@localhost.localdomain> (raw)
In-Reply-To: <7ccb7f4a-b576-349f-655c-f741ec3a0dff@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]

Am 29.11.2017 um 13:24 hat Paolo Bonzini geschrieben:
> On 29/11/2017 13:00, Stefan Hajnoczi wrote:
> > We are at a point where code review isn't finding certain bugs because
> > no single person knows all the assumptions.  Previously the problem was
> > contained because maintainers spotted problems before patches were
> > merged.
> > 
> > This is not primarily a documentation problem though.  We cannot
> > document our way out of this because no single person (patch author or
> > code reviewer) can know or check everything anymore due to the scale.
> > 
> > I think it's a (lack of) design problem because we have many incomplete
> > abstractions like block jobs, IOThreads, block graph, image locking,
> > etc.  They do not cover all possibly states and interactions today.
> > Extending them leads to complex bugs.
> 
> I think the main interactions are:
> 
> 1) block graph modifications and drain.  This has always been a carnage.
>  Implementing BlockBackend isolation instead of drain would probably be
> a starting point to fix it, because IIRC there are extremely few cases
> where we really need "drain" semantics.

I think it's not just specifically drain, but nested event loops in
general. Drain is just more prominent because it recursively affects the
whole tree and actively waits for callbacks, so if anything can go
wrong, it will certainly affect drain, too.

The big problem I see here is that we have never defined in which places
or under which conditions it's allowed to make changes to the graph.
This means that callers never know when to use an extra bdrv_ref/unref
pair, when to expect that child references change in the middle of the
operation etc.

Maybe what we need there is some coroutine locks that make sure that
e.g. a block job completion simply has to wait until a drain has
completed before the graph change is actually executed. We need to make
sure that these locks don't deadlock the drain operation, but as long as
these things run in a separate coroutine (like the block job coroutine),
it should be okay.

Kevin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

next prev parent reply	other threads:[~2017-11-29 13:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-29  3:55 [Qemu-devel] Block layer complexity: what to do to keep it under control? Fam Zheng
2017-11-29  6:30 ` Jeff Cody
2017-11-29 12:16   ` Stefan Hajnoczi
2017-11-29 12:22     ` Paolo Bonzini
2017-11-29 12:00 ` Stefan Hajnoczi
2017-11-29 12:24   ` Paolo Bonzini
2017-11-29 13:24     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-11-29 13:41     ` Kevin Wolf [this message]
2017-11-29 19:58     ` [Qemu-devel] " Dr. David Alan Gilbert
2017-11-30  9:47   ` Fam Zheng
2017-11-30 14:19     ` Stefan Hajnoczi
2017-12-01 10:16       ` Fam Zheng
2017-12-01 14:08         ` Stefan Hajnoczi
2017-12-01 15:00           ` Fam Zheng
2017-12-01 17:03           ` Paolo Bonzini
2017-12-01 19:03             ` Peter Maydell
2017-12-04 10:41               ` Stefan Hajnoczi
2017-12-01 19:27             ` Eric Blake
2017-12-04 10:16               ` Stefan Hajnoczi
2017-12-04 10:32                 ` Peter Maydell
2017-11-29 12:32 ` Daniel P. Berrange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171129134130.GC3753@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).