From: Kevin Wolf <kwolf@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Krempa <pkrempa@redhat.com>, svc-armband <armband@enea.com>,
Markus Armbruster <armbru@redhat.com>,
Jeff Cody <jcody@redhat.com>,
Ciprian Barbu <Ciprian.Barbu@enea.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Max Reitz <mreitz@redhat.com>,
Alexandru Avadanii <Alexandru.Avadanii@enea.com>
Subject: Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
Date: Thu, 6 Apr 2017 10:48:24 +0200 [thread overview]
Message-ID: <20170406084824.GA4341@noname.redhat.com> (raw)
In-Reply-To: <1556fcc9-50ab-7458-74fe-41756f1dd451@redhat.com>
Am 05.04.2017 um 23:13 hat Paolo Bonzini geschrieben:
> On 05/04/2017 13:01, Kevin Wolf wrote:
> > Am 04.04.2017 um 17:09 hat Paolo Bonzini geschrieben:
> >> On 04/04/2017 16:53, Kevin Wolf wrote:
> >>>> The big question is how this fits into release management. We have
> >>>> another important regression from the op blocker work and only a week
> >>>> to go before the last rc. Are we going to delay 2.9 arbitrarily? Are
> >>>> we going to shorten the 2.10 development period correspondingly? (I
> >>>> vote yes and yes, FWIW).
> >>> Which is the other regression?
> >>
> >> The assertion failure for snapshot_blkdev with iothreads.
> >
> > Ah, right, I keep forgetting that this started appearing with the op
> > blocker series because the failure mode is completely different, so it
> > seems to have been a latent bug somewhere else that was uncovered by it.
> >
> > If we're sure that the change of the order in bdrv_append() is what
> > caused the bug to appear, we can just undo that for 2.9, at the cost of
> > a messed up graph in the error case when bdrv_set_backing_hd() fails
> > (because we have no way to undo bdrv_replace_node()).
>
> I don't know if that is enough to fix all of the issues, but the bug is
> easy to reproduce.
>
> The issue is the lack of understanding of what node movement does to
> quiesce_counter. The invariant is that children cannot have a lower
> quiesce_counter than parents, I think (paths in the graph can only join
> in the children direction, right?).
Maybe I'm missing something, but I think this isn't true at all. Drains
are propagated to the parents, so that this specific node doesn't
receive new requests, but not to the children. The assumption is that
children don't do anything anyway without requests from their parents,
so they are effectively quiesced even with quiesce_counter == 0.
So if anything, the invariant should be the exact opposite: Parents
cannot have a lower quiesce_counter than their children.
I think the exact thing that the quiesce_counter of a node is expected
to be is the number of paths from itself to an explicitly drained node
in the directed block driver graph (counting one path if it is
explicitly drained itself). A path counts multiple times if a node is
explicitly drained multiple times.
> Is it checked, and are there violations already? Maybe we need a
> get_quiesce_counter method in BdrvChildRole, to cover BlockBackend's
> quiesce_counter? Then we can use that information to adjust the
> quiesce_counter when nodes move in the graph.
We would need that if we had a downwards propagation and if a
BlockBackend could be drained, but as it stands, I don't see what could
be missing from bdrv_replace_child_noperm() (well, except that I think
your patch is right to avoid calling drained_end/begin if both nodes
were drained because new requests could sneak in this way in theory).
> The block layer has good tests, but as the internal logic grows more
> complex we should probably have more C level tests. I'm constantly
> impressed by the amount of tricky cases that test-replication.c catches
> in the block job code.
Never really noticed test-replication specifically catching things when
I worked on the op blockers code which changed a lot around block jobs,
but that we should consider this type of tests more often is probably a
good point.
Kevin
next prev parent reply other threads:[~2017-04-06 8:48 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-31 16:03 [Qemu-devel] nbd: Possible regression in 2.9 RCs Ciprian Barbu
2017-03-31 17:32 ` Ciprian Barbu
2017-03-31 17:36 ` Ciprian Barbu
2017-03-31 17:43 ` Max Reitz
2017-03-31 17:49 ` Ciprian Barbu
2017-03-31 17:57 ` Max Reitz
2017-03-31 19:07 ` Alexandru Avadanii
2017-04-03 18:52 ` Kashyap Chamarthy
2017-04-04 8:07 ` ciprian.barbu
2017-04-03 8:15 ` Kevin Wolf
2017-04-03 12:39 ` Max Reitz
2017-04-03 13:00 ` Kevin Wolf
2017-04-03 13:50 ` Peter Krempa
2017-04-04 12:16 ` Kevin Wolf
2017-04-04 13:51 ` Eric Blake
2017-04-04 14:04 ` Paolo Bonzini
2017-04-04 14:53 ` Kevin Wolf
2017-04-04 15:09 ` Paolo Bonzini
2017-04-05 11:01 ` Kevin Wolf
2017-04-05 21:13 ` Paolo Bonzini
2017-04-06 8:48 ` Kevin Wolf [this message]
2017-04-06 9:03 ` Kevin Wolf
2017-04-03 19:48 ` Eric Blake
2017-04-03 19:44 ` Eric Blake
2017-04-04 8:17 ` ciprian.barbu
2017-04-04 11:00 ` Paolo Bonzini
2017-04-04 11:14 ` Daniel P. Berrange
2017-04-03 12:51 ` Peter Krempa
2017-04-04 13:19 ` Kevin Wolf
2017-04-04 13:27 ` Peter Krempa
2017-04-04 13:54 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170406084824.GA4341@noname.redhat.com \
--to=kwolf@redhat.com \
--cc=Alexandru.Avadanii@enea.com \
--cc=Ciprian.Barbu@enea.com \
--cc=armband@enea.com \
--cc=armbru@redhat.com \
--cc=jcody@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=pkrempa@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).