From: Kevin Wolf <kwolf@redhat.com>
To: John Snow <jsnow@redhat.com>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-devel <qemu-devel@nongnu.org>,
Qemu-block <qemu-block@nongnu.org>, Max Reitz <mreitz@redhat.com>
Subject: Re: bitmap migration bug with -drive while block mirror runs
Date: Tue, 1 Oct 2019 17:58:59 +0200 [thread overview]
Message-ID: <20191001155859.GE4688@linux.fritz.box> (raw)
In-Reply-To: <c051fd5c-31be-c98b-8155-70fe1b6c1283@redhat.com>
Am 01.10.2019 um 17:09 hat John Snow geschrieben:
>
>
> On 10/1/19 5:54 AM, Kevin Wolf wrote:
> > Am 01.10.2019 um 10:57 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 01.10.2019 3:09, John Snow wrote:
> >>> Hi folks, I identified a problem with the migration code that Red Hat QE
> >>> found and thought you'd like to see it:
> >>>
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1652424#c20
> >>>
> >>> Very, very briefly: drive-mirror inserts a filter node that changes what
> >>> bdrv_get_device_or_node_name() returns, which causes a migration problem.
> >>>
> >>>
> >>> Ignorant question #1: Can we multi-parent the filter node and
> >>> source-node? It looks like at the moment both consider their only parent
> >>> to be the block-job and don't have a link back to their parents otherwise.
> >>>
> >>>
> >>> Otherwise: I have a lot of cloudy ideas on how to solve this, but
> >>> ultimately what we want is to be able to find the "addressable" name for
> >>> the node the bitmap is attached to, which would be the name of the first
> >>> ancestor node that isn't a filter. (OR, the name of the block-backend
> >>> above that node.)
> >>
> >>
> >> Better would be to migrate by node-name only.. But am I right that
> >> node-names are different on source and destination? Or this situation
> >> changed?
> >
> > Traditionally, I think migration assumes that frontends (guest devices)
> > must match exactly, but backends may and usually will differ.
> >
> > Of course, dirty bitmaps are a backend feature that isn't really related
> > to guest devices, so this doesn't really work out any more in your case.
> > BlockBackend names are unusable for this purpose (especially as we're
> > moving towards anonymous BlockBackends everywhere), which I guess
> > essentially means node-name is the only option left.
> >
>
> The problem as I see it involves API stability.
>
> We allow block-dirty-bitmap-add against e.g. "drive1" through the
> block-backend name (the name of the "drive" as the user sees it.)
>
> Of course, once you start mirror, you aren't able to access that bitmap
> through that namepair anymore -- the "address" of the bitmap has "changed"!
>
> (In actual fact, the bitmap always had two addresses; and simply we lost
> an alias -- but it's the one that the user likely used to create the
> bitmap, so that's bad.)
So if I understand correctly, the problem is that without a filter, in
some setups we get a usable BlockBackend name like "drive1", but when a
filter is added, we return the node-name instead which is
auto-generated and will be different on the destination.
Looking at the ChildRole documentation:
/* Returns a name that is supposedly more useful for human users than the
* node name for identifying the node in question (in particular, a BB
* name), or NULL if the parent can't provide a better name. */
const char *(*get_name)(BdrvChild *child);
I'd argue that a BlockBackend name is more useful for a human user even
across filter, so I'd support a .get_name implementation for a filter
child role (which Max wanted to introduce anyway for his filter series).
Of course, if you have a function that is made to return a convenient
text for human users, and you use it for a stable ABI like the migration
stream, this is an idea that would certainly have caused an entertaining
Linus rant in the good old kernel times.
> > Is bitmap migration something that must be enabled explicitly or does
> > it happen automatically? If it's explicit, then making an additional
> > requirement (matching node-names) shouldn't be a problem.
>
> This means that bitmap migration becomes a blockdev-only feature.
I meant this more as the preferred way for the future rather than the
only thing supported.
But Peter has actually mentioned that for libvirt it will be
blockdev-only anyway. So do we even have a good reason to invest much
for the non-blockdev case?
Maybe making it blockdev-only is actually pretty reasonable.
> Serious question: do we have plans to formally deprecate things like
> -drive and mandate a blockdev workflow, or otherwise work to unify the
> actual graph that gets created between the two methods?
It's high on my wishlist, though we can't before libvirt uses blockdev.
Maybe something to talk about at KVM Forum?
> >>> A simple way to do this might be a "child_unfiltered" BdrvChild role
> >>> that simply bypasses the filter that was inserted and serves no real
> >>> purpose other than to allow the child to have a parent link and find who
> >>> it's """real""" parent is.
> >>>
> >>> Because of flushing, reopen, sync, drain &c &c &c I'm not sure how
> >>> feasible this quick idea might be, though.
> >>>
> >>>
> >>> - Corollary fix #1: call error_setg if the bitmap node name that's about
> >>> to go over the wire is an autogenerated node: this is never correct!
> >>>
> >>> (Why not? because the target is incapable of matching the node-name
> >>> because they are randomly generated AND you cannot specify node-names
> >>> with # prefixes as they are especially reserved!
> >>>
> >>> (This raises a related problem: if you explicitly add bitmaps to nodes
> >>> with autogenerated names, you will be unable to migrate them.))
> >>>
> >>
> >> In other words, we need a well defined way to match nodes on source and destination,
> >> keeping in mind filters, to migrate bitmaps correctly.
> >>
>
> Yes, exactly.
>
> >> Hm, did you thought about bitmaps in filters? It's not a problem to create bitmap in
> >> mirror-top filter during mirror job:)
> >>
> >> Or what about bitmaps in Quorum children? Or what about bitmap in qcow2 file child bs?
> >>
> >> If node-names are different on source and destination, what is the same? Top blk name
> >> and bdrv-children names (I recently saw Max's idea to check node "path" in iotest).
> >
> > blk_name has to be assumed to be "". The BdrvChild path changes when
> > filters are inserted (and inserting filters on the destination that
> > aren't present on the source, or vice versa, sounds like something that
> > should just work).
> >
> > So both parts of this are not great ways for addressing nodes.
> >
> >> So, actually node is migration-addressable, if path <blk-name>/root[/child-name] to the
> >> defines this node directly (we must not have children with same name for some node in
> >> the path).
> >>
> >> And I think it's a correct way to define node in migration stream - by path.
> >
> > I'm afraid node-name is the only thing that could possibly work reliably
> > for identifying nodes.
> >
> > Kevin
> >
>
> It sounds like you are saying that bitmaps must become a blockdev-only
> feature.
More like we can't rely on BlockBackend names in a blockdev setup.
BlockBackend names can work in a purely traditional setup if we make
some effort to find the block backend even with filters in bettwen, but
they aren't universal even then. And node-names work in the blockdev
case, but they aren't universal either.
But as I said above, you made me wonder whether making it a
blockdev-only feature wouldn't actually be a good idea.
> I'm not sure if I have arrived at that conclusion yet, but it's at least
> inarguable that with blockdev it's a lot simpler to guarantee correctness.
>
> However, we still have -cdrom and -hda and -drive and any number of
> sugars that I think we aren't committed to getting rid of yet... (or ever?)
-hda and -cdrom currently have hard-coded BlockBackend names. I don't
see what would stop us from changing this to hard-coded node names.
-drive is something that we probably need to remove in its current form
and introduce something new that is both user-friendly and powerful.
Actually, we should probably add the replacement first. :-)
Kevin
next prev parent reply other threads:[~2019-10-01 16:37 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-01 0:09 bitmap migration bug with -drive while block mirror runs John Snow
2019-10-01 4:28 ` Peter Krempa
2019-10-01 9:07 ` Vladimir Sementsov-Ogievskiy
2019-10-01 8:57 ` Vladimir Sementsov-Ogievskiy
2019-10-01 9:54 ` Kevin Wolf
2019-10-01 10:05 ` Vladimir Sementsov-Ogievskiy
2019-10-01 13:24 ` Peter Krempa
2019-10-01 15:09 ` John Snow
2019-10-01 15:58 ` Kevin Wolf [this message]
2019-10-01 16:12 ` Vladimir Sementsov-Ogievskiy
2019-10-01 16:24 ` Kevin Wolf
2019-10-01 16:23 ` John Snow
2019-10-01 11:45 ` Peter Krempa
2019-10-01 9:17 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:00 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:10 ` John Snow
2019-10-01 15:57 ` Vladimir Sementsov-Ogievskiy
2019-10-01 16:07 ` John Snow
2019-10-02 8:12 ` Kevin Wolf
2019-10-02 10:46 ` Peter Krempa
2019-10-02 11:11 ` Kevin Wolf
2019-10-02 12:22 ` Vladimir Sementsov-Ogievskiy
2019-10-02 13:48 ` Peter Krempa
2019-10-02 13:43 ` Peter Krempa
2019-10-02 14:03 ` Vladimir Sementsov-Ogievskiy
2019-10-02 21:35 ` John Snow
2019-10-03 10:14 ` Vladimir Sementsov-Ogievskiy
2019-10-03 23:34 ` John Snow
2019-10-04 8:33 ` Peter Krempa
2019-10-04 9:21 ` Vladimir Sementsov-Ogievskiy
2019-10-06 3:15 ` John Snow
2019-10-04 9:24 ` Vladimir Sementsov-Ogievskiy
2019-10-04 13:07 ` Eric Blake
2019-10-06 3:19 ` John Snow
2019-10-01 16:16 ` Kevin Wolf
2019-10-01 16:17 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:13 ` Max Reitz
2019-10-01 14:27 ` Vladimir Sementsov-Ogievskiy
2019-10-01 14:34 ` Max Reitz
2019-10-01 14:53 ` Vladimir Sementsov-Ogievskiy
2019-10-01 15:26 ` Max Reitz
2019-10-02 7:34 ` Peter Krempa
2019-10-01 15:09 ` Kevin Wolf
2019-10-01 15:27 ` Max Reitz
2019-10-01 16:12 ` Kevin Wolf
2019-10-01 16:17 ` Max Reitz
2019-10-01 16:22 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191001155859.GE4688@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=jsnow@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.