From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46494)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1aalHR-0006gH-8W
	for qemu-devel@nongnu.org; Tue, 01 Mar 2016 09:28:11 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1aalHP-0004uZ-Lk
	for qemu-devel@nongnu.org; Tue, 01 Mar 2016 09:28:05 -0500
Date: Tue, 1 Mar 2016 15:27:54 +0100
From: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20160301142754.GE4250@noname.str.redhat.com>
References: <20160229143649.GD5004@noname.redhat.com>
	<87a8mimqb3.fsf@blackfin.pond.sub.org>
	<20160301141950.GD4250@noname.str.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160301141950.GD4250@noname.str.redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] block: Dirty bitmaps and COR in
 bdrv_move_feature_fields()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: pbonzini@redhat.com, famz@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, mreitz@redhat.com

Am 01.03.2016 um 15:19 hat Kevin Wolf geschrieben:
> Am 01.03.2016 um 11:43 hat Markus Armbruster geschrieben:
> > qemu-block@ without qemu-devel@, intentional?

Oops, I didn't only forget to copy qemu-devel initially, but also
completely missed this when replying. No, not intentional. Forwarding
this one now so that qemu-devel sees at least that this thread exists...

Kevin

> > Kevin Wolf <kwolf@redhat.com> writes:
> > 
> > > Hi all,
> > >
> > > I'm currently trying to get rid of bdrv_move_feature_fields(), so we can
> > > finally have more than one BB per BDS. Generally the way to do this is
> > > to move features from BDS and block.c to BB and block-backend.c.
> > > However, for two of the features I'm not sure about this:
> > >
> > > * Copy on Read:
> > >
> > >   When Jeff introduced bdrv_append() in commit 8802d1fd, the CoR flag
> > >   was already moved to the new top level when taking a snapshot. Does
> > >   anyone remember why it works like that? It doesn't seem to make a lot
> > >   of sense to me.
> > >
> > >   The use case for manually enabled CoR is to avoid reading data twice
> > >   from a slow remote image, so we want to save it to a local overlay,
> > >   say an ISO image accessed via HTTP to a local qcow2 overlay.
> > 
> > Ignorant / forgetful question: we do that by adding a QCOW2 on top with
> > COR enabled, and that makes QCOW2 copy up on read?
> 
> It's not qcow2, but block.c that implements it, but otherwise yes.
> 
> $ qemu-img create -f qcow2 -b http://something.slow disk.qcow2
> $ qemu-system-x86_64 -drive file=disk.qcow2,copy-on-read=on ...
> 
> > >                                                                When
> > >   taking a snapshot, we end up with a backing chain like this:
> > >
> > >       http <- local.qcow2 <- snap_overlay.qcow2
> > 
> > Now COR is enabled where?  Just in snap_overlay.qcow2?
> 
> Yes, that's the current behaviour.
> 
> Of course it still copies everything that is read from the remote host
> because it goes through both qcow2 layer, it just copies a bit more than
> that and duplicates local data in both layers.
> 
> > >   There is no point in performing copy on read from local.qcow2 into
> > >   snap_overlay.qcow2, we just want to keep copying data from the remote
> > >   source into local.qcow2.
> > 
> > Makes sense.
> > 
> > >   Possible caveat: We would be writing to a backing file, but that's
> > >   similar to what some block jobs do, so if we design our op blockers to
> > >   cover this case, it should be fine.
> > 
> > COR would write to backing file local.qcow2.  Doesn't change contents of
> > the http <- local.qcow2 substack, though.
> 
> Right, that's why the operation can be done in the first place.
> 
> > >   I'm actually pretty sure that simply removing COR from the list, and
> > >   therefore changing the behaviour to not move it to the top any more,
> > >   is the right thing to do and could be considered a bug fix.
> > 
> > I'm not sure I got the relation to BBs.  Perhaps its about the rule "if
> > $feature sticks to the top when we put a BDS on top, it should probably
> > live in the BB instead."  Your point seems to be that COR shouldn't
> > stick to the top.  Is that roughly right?
> 
> Not only roughly. :-)
> 
> In order to allow multiple BBs per BDS I need to complete the split now,
> so all features that currently stick to the top by using the remnants of
> bdrv_swap() need to be properly converted to BB instead. The reason for
> that is that if the feature is supposed to be logically part of the BB
> level, different BBs on the same BDS can have different setting.
> 
> If we decide that it shouldn't be in the BB level in the first place,
> instead of converting the feature, I can simply drop it from the
> bdrv_swap remnants.
> 
> > You gave an example where COR should stay put.  Do we know of any use of
> > COR where sticking to the top makes sense?
> 
> To be honest, I'm not sure if any COR users exist out there. The main
> motivation for it was that the streaming block job uses it internally.
> The example I gave is what Anthony used to give when the feature was
> introduced.
> 
> > In general, having the block layer move things around implicitly when
> > the user adds a BDS or BB is prone to create awkward questions like "is
> > this the right move for all possible user intents?"  I hope that the
> > ongoing rework will lead to less implicit magic and more explicit
> > control.
> 
> Implicit magic is becoming harder to implement as I remove bs->blk, so I
> think we're on the right way there.
> 
> > > * Dirty bitmaps:
> > >
> > >   We're currently trying, and if I'm not mistaken failing, to move dirty
> > >   bitmaps to the top. The (back then one) bitmap was first added to the
> > >   list in Paolo's commit a9fc4408, with the following commit message:
> > >
> > >     While these should not be in use at the time a transaction is
> > >     started, a command in the prepare phase of a transaction might have
> > >     added them, so they need to be brought over.
> > >
> > >   At that point, there was no transactionable command that did this in
> > >   the prepare phase. Today we have mirror and backup, but op blockers
> > >   should prevent them from being mixed with snapshots in a single
> > >   transaction, so I can't see how this change had any effect.
> > >
> > >   The reason why I think we're failing to move dirty bitmaps to the top
> > >   today is that we're moving the head of the list to a different object
> > >   without updating the prev link in the first element, so in any case
> > >   it's buggy today.
> > >
> > >   I really would like to keep bitmaps on the BDS where they are, but
> > >   unfortunately, we also have user-defined bitmaps by now, and if we
> > >   change whether they stick with the top level, that's a change that is
> > >   visible on the QMP interface.
> > >
> > >   On the other hand, the QMP interface clearly describes bitmaps as
> > >   belonging to a node rather than a BB (you can use node-name, even with
> > >   no BB attached), so moving them could be considered a bug, even if
> > >   it is the existing behaviour.
> > 
> > You just told us moving doesn't work.  Did it ever work in any release
> > that also provides the QMP interface in question?
> 
> The feature was introduced in 2.4 (commit 341ebc2f) and I think I broke
> it in time for 2.5 (the bdrv_swap() removal in dd62f1ca and the
> following two patches).
> 
> > If no, existing behavior doesn't matter :)
> > 
> > If yes, the interface might be new enough to permit incompatible design
> > flaw fixes.  Paolo thinks bitmaps haven't been used widely.  Discuss
> > with their known users?
> 
> Who knows the known users? John? (CCed)
> 
> > >   I can imagine use cases for both ways, so the interface that would
> > >   make the most sense to me is to generally keep BDSes at their node,
> > >   and to provide a QMP command to move them to a different one.
> > 
> > Explicit control instead of implicit magic --- yes, please.
> > 
> > >   With compatibility in mind, this seems to be a reall tough one,
> > >   though.
> > >
> > > Any comments or ideas how to proceed with those two?
> > 
> > Hope I could help a little.
> 
> Yes, thanks. I hope my answers to your questions give you a clearer
> picture, too.
> 
> Kevin
>