From: Kevin Wolf <kwolf@redhat.com>
To: John Snow <jsnow@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-devel <qemu-devel@nongnu.org>,
Qemu-block <qemu-block@nongnu.org>, Max Reitz <mreitz@redhat.com>
Subject: Re: bitmap migration bug with -drive while block mirror runs
Date: Tue, 1 Oct 2019 17:58:59 +0200 [thread overview]
Message-ID: <20191001155859.GE4688@linux.fritz.box> (raw)
In-Reply-To: <c051fd5c-31be-c98b-8155-70fe1b6c1283@redhat.com>
Am 01.10.2019 um 17:09 hat John Snow geschrieben:
>
>
> On 10/1/19 5:54 AM, Kevin Wolf wrote:
> > Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 01.10.2019 3:09, John Snow wrote:
> >>> Hi folks, I identified a problem with the migration code that Red Hat QE
> >>> found and thought you'd like to see it:
> >>>
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> >>>
> >>> Very, very briefly: drive-mirror inserts a filter node that changes what
> >>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> >>>
> >>>
> >>> Ignorant question #1: Can we multi-parent the filter node and
> >>> source-node? It looks like at the moment both consider their only parent
> >>> to be the block-job and don't have a link back to their parents otherwise.
> >>>
> >>>
> >>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>> ultimately what we want is to be able to find the "addressable" name for
> >>> the node the bitmap is attached to, which would be the name of the first
> >>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>> above that node.)
> >>
> >>
> >> Better would be to migrate by node-name only.. But am I right that
> >> node-names are different on source and destination? Or this situation
> >> changed?
> >
> > Traditionally, I think migration assumes that frontends (guest devices)
> > must match exactly, but backends may and usually will differ.
> >
> > Of course, dirty bitmaps are a backend feature that isn't really related
> > to guest devices, so this doesn't really work out any more in your case.
> > BlockBackend names are unusable for this purpose (especially as we're
> > moving towards anonymous BlockBackends everywhere), which I guess
> > essentially means node-name is the only option left.
> >
>
> The problem as I see it involves API stability.
>
> We allow block-dirty-bitmap-add against e.g. "drive1" through the
> block-backend name (the name of the "drive" as the user sees it.)
>
> Of course, once you start mirror, you aren't able to access that bitmap
> through that namepair anymore -- the "address" of the bitmap has "changed"!
>
> (In actual fact, the bitmap always had two addresses; and simply we lost
> an alias -- but it's the one that the user likely used to create the
> bitmap, so that's bad.)
So if I understand correctly, the problem is that without a filter, in
some setups we get a usable BlockBackend name like "drive1", but when a
filter is added, we return the node-name instead which is
auto-generated and will be different on the destination.
Looking at the ChildRole documentation:
/* Returns a name that is supposedly more useful for human users than the
* node name for identifying the node in question (in particular, a BB
* name), or NULL if the parent can't provide a better name. */
const char *(*get_name)(BdrvChild *child);
I'd argue that a BlockBackend name is more useful for a human user even
across filter, so I'd support a .get_name implementation for a filter
child role (which Max wanted to introduce anyway for his filter series).
Of course, if you have a function that is made to return a convenient
text for human users, and you use it for a stable ABI like the migration
stream, this is an idea that would certainly have caused an entertaining
Linus rant in the good old kernel times.
> > Is bitmap migration something that must be enabled explicitly or does
> > it happen automatically? If it's explicit, then making an additional
> > requirement (matching node-names) shouldn't be a problem.
>
> This means that bitmap migration becomes a blockdev-only feature.
I meant this more as the preferred way for the future rather than the
only thing supported.
But Peter has actually mentioned that for libvirt it will be
blockdev-only anyway. So do we even have a good reason to invest much
for the non-blockdev case?
Maybe making it blockdev-only is actually pretty reasonable.
> Serious question: do we have plans to formally deprecate things like
> -drive and mandate a blockdev workflow, or otherwise work to unify the
> actual graph that gets created between the two methods?
It's high on my wishlist, though we can't before libvirt uses blockdev.
Maybe something to talk about at KVM Forum?
> >>> A simple way to do this might be a "child_unfiltered" BdrvChild role
> >>> that simply bypasses the filter that was inserted and serves no real
> >>> purpose other than to allow the child to have a parent link and find who
> >>> it's """real""" parent is.
> >>>
> >>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> >>> feasible this quick idea might be, though.
> >>>
> >>>
> >>> - Corollary fix #1: call error_setg if the bitmap node name that's about
> >>> to go over the wire is an autogenerated node: this is never correct!
> >>>
> >>> (Why not? because the target is incapable of matching the node-name
> >>> because they are randomly generated AND you cannot specify node-names
> >>> with # prefixes as they are especially reserved!
> >>>
> >>> (This raises a related problem: if you explicitly add bitmaps to nodes
> >>> with autogenerated names, you will be unable to migrate them.))
> >>>
> >>
> >> In other words, we need a well defined way to match nodes on source and destination,
> >> keeping in mind filters, to migrate bitmaps correctly.
> >>
>
> Yes, exactly.
>
> >> Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
> >> mirror-top filter during mirror job:)
> >>
> >> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?
> >>
> >> If node-names are different on source and destination, what is the same? Top blk name
> >> and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).
> >
> > blk_name has to be assumed to be "". The BdrvChild path changes when
> > filters are inserted (and inserting filters on the destination that
> > aren't present on the source, or vice versa, sounds like something that
> > should just work).
> >
> > So both parts of this are not great ways for addressing nodes.
> >
> >> So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
> >> defines this node directly (we must not have children with same name for some node in
> >> the path).
> >>
> >> And I think it's a correct way to define node in migration stream - by path.
> >
> > I'm afraid node-name is the only thing that could possibly work reliably
> > for identifying nodes.
> >
> > Kevin
> >
>
> It sounds like you are saying that bitmaps must become a blockdev-only
> feature.
More like we can't rely on BlockBackend names in a blockdev setup.
BlockBackend names can work in a purely traditional setup if we make
some effort to find the block backend even with filters in bettwen, but
they aren't universal even then. And node-names work in the blockdev
case, but they aren't universal either.
But as I said above, you made me wonder whether making it a
blockdev-only feature wouldn't actually be a good idea.
> I'm not sure if I have arrived at that conclusion yet, but it's at least
> inarguable that with blockdev it's a lot simpler to guarantee correctness.
>
> However, we still have -cdrom and -hda and -drive and any number of
> sugars that I think we aren't committed to getting rid of yet... (or ever?)
-hda and -cdrom currently have hard-coded BlockBackend names. I don't
see what would stop us from changing this to hard-coded node names.
-drive is something that we probably need to remove in its current form
and introduce something new that is both user-friendly and powerful.
Actually, we should probably add the replacement first. :-)
Kevin
next prev parent reply other threads:[~2019-10-01 16:37 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-01 0:09 bitmap migration bug with -drive while block mirror runs John Snow
2019-10-01 4:28 ` Peter Krempa
2019-10-01 9:07 ` Vladimir Sementsov-Ogievskiy
2019-10-01 8:57 ` Vladimir Sementsov-Ogievskiy
2019-10-01 9:54 ` Kevin Wolf
2019-10-01 10:05 ` Vladimir Sementsov-Ogievskiy
2019-10-01 13:24 ` Peter Krempa
2019-10-01 15:09 ` John Snow
2019-10-01 15:58 ` Kevin Wolf [this message]
2019-10-01 16:12 ` Vladimir Sementsov-Ogievskiy
2019-10-01 16:24 ` Kevin Wolf
2019-10-01 16:23 ` John Snow
2019-10-01 11:45 ` Peter Krempa
2019-10-01 9:17 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:10 ` John Snow
2019-10-01 15:57 ` Vladimir Sementsov-Ogievskiy
2019-10-01 16:07 ` John Snow
2019-10-02 8:12 ` Kevin Wolf
2019-10-02 10:46 ` Peter Krempa
2019-10-02 11:11 ` Kevin Wolf
2019-10-02 12:22 ` Vladimir Sementsov-Ogievskiy
2019-10-02 13:48 ` Peter Krempa
2019-10-02 13:43 ` Peter Krempa
2019-10-02 14:03 ` Vladimir Sementsov-Ogievskiy
2019-10-02 21:35 ` John Snow
2019-10-03 10:14 ` Vladimir Sementsov-Ogievskiy
2019-10-03 23:34 ` John Snow
2019-10-04 8:33 ` Peter Krempa
2019-10-04 9:21 ` Vladimir Sementsov-Ogievskiy
2019-10-06 3:15 ` John Snow
2019-10-04 9:24 ` Vladimir Sementsov-Ogievskiy
2019-10-04 13:07 ` Eric Blake
2019-10-06 3:19 ` John Snow
2019-10-01 16:16 ` Kevin Wolf
2019-10-01 16:17 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:13 ` Max Reitz
2019-10-01 14:27 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:34 ` Max Reitz
2019-10-01 14:53 ` Vladimir Sementsov-Ogievskiy
2019-10-01 15:26 ` Max Reitz
2019-10-02 7:34 ` Peter Krempa
2019-10-01 15:09 ` Kevin Wolf
2019-10-01 15:27 ` Max Reitz
2019-10-01 16:12 ` Kevin Wolf
2019-10-01 16:17 ` Max Reitz
2019-10-01 16:22 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191001155859.GE4688@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=jsnow@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).