From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aalHR-0006gH-8W for qemu-devel@nongnu.org; Tue, 01 Mar 2016 09:28:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aalHP-0004uZ-Lk for qemu-devel@nongnu.org; Tue, 01 Mar 2016 09:28:05 -0500 Date: Tue, 1 Mar 2016 15:27:54 +0100 From: Kevin Wolf Message-ID: <20160301142754.GE4250@noname.str.redhat.com> References: <20160229143649.GD5004@noname.redhat.com> <87a8mimqb3.fsf@blackfin.pond.sub.org> <20160301141950.GD4250@noname.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160301141950.GD4250@noname.str.redhat.com> Subject: Re: [Qemu-devel] [Qemu-block] block: Dirty bitmaps and COR in bdrv_move_feature_fields() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: pbonzini@redhat.com, famz@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, mreitz@redhat.com Am 01.03.2016 um 15:19 hat Kevin Wolf geschrieben: > Am 01.03.2016 um 11:43 hat Markus Armbruster geschrieben: > > qemu-block@ without qemu-devel@, intentional? Oops, I didn't only forget to copy qemu-devel initially, but also completely missed this when replying. No, not intentional. Forwarding this one now so that qemu-devel sees at least that this thread exists... Kevin > > Kevin Wolf writes: > > > > > Hi all, > > > > > > I'm currently trying to get rid of bdrv_move_feature_fields(), so we can > > > finally have more than one BB per BDS. Generally the way to do this is > > > to move features from BDS and block.c to BB and block-backend.c. > > > However, for two of the features I'm not sure about this: > > > > > > * Copy on Read: > > > > > > When Jeff introduced bdrv_append() in commit 8802d1fd, the CoR flag > > > was already moved to the new top level when taking a snapshot. Does > > > anyone remember why it works like that? It doesn't seem to make a lot > > > of sense to me. > > > > > > The use case for manually enabled CoR is to avoid reading data twice > > > from a slow remote image, so we want to save it to a local overlay, > > > say an ISO image accessed via HTTP to a local qcow2 overlay. > > > > Ignorant / forgetful question: we do that by adding a QCOW2 on top with > > COR enabled, and that makes QCOW2 copy up on read? > > It's not qcow2, but block.c that implements it, but otherwise yes. > > $ qemu-img create -f qcow2 -b http://something.slow disk.qcow2 > $ qemu-system-x86_64 -drive file=disk.qcow2,copy-on-read=on ... > > > > When > > > taking a snapshot, we end up with a backing chain like this: > > > > > > http <- local.qcow2 <- snap_overlay.qcow2 > > > > Now COR is enabled where? Just in snap_overlay.qcow2? > > Yes, that's the current behaviour. > > Of course it still copies everything that is read from the remote host > because it goes through both qcow2 layer, it just copies a bit more than > that and duplicates local data in both layers. > > > > There is no point in performing copy on read from local.qcow2 into > > > snap_overlay.qcow2, we just want to keep copying data from the remote > > > source into local.qcow2. > > > > Makes sense. > > > > > Possible caveat: We would be writing to a backing file, but that's > > > similar to what some block jobs do, so if we design our op blockers to > > > cover this case, it should be fine. > > > > COR would write to backing file local.qcow2. Doesn't change contents of > > the http <- local.qcow2 substack, though. > > Right, that's why the operation can be done in the first place. > > > > I'm actually pretty sure that simply removing COR from the list, and > > > therefore changing the behaviour to not move it to the top any more, > > > is the right thing to do and could be considered a bug fix. > > > > I'm not sure I got the relation to BBs. Perhaps its about the rule "if > > $feature sticks to the top when we put a BDS on top, it should probably > > live in the BB instead." Your point seems to be that COR shouldn't > > stick to the top. Is that roughly right? > > Not only roughly. :-) > > In order to allow multiple BBs per BDS I need to complete the split now, > so all features that currently stick to the top by using the remnants of > bdrv_swap() need to be properly converted to BB instead. The reason for > that is that if the feature is supposed to be logically part of the BB > level, different BBs on the same BDS can have different setting. > > If we decide that it shouldn't be in the BB level in the first place, > instead of converting the feature, I can simply drop it from the > bdrv_swap remnants. > > > You gave an example where COR should stay put. Do we know of any use of > > COR where sticking to the top makes sense? > > To be honest, I'm not sure if any COR users exist out there. The main > motivation for it was that the streaming block job uses it internally. > The example I gave is what Anthony used to give when the feature was > introduced. > > > In general, having the block layer move things around implicitly when > > the user adds a BDS or BB is prone to create awkward questions like "is > > this the right move for all possible user intents?" I hope that the > > ongoing rework will lead to less implicit magic and more explicit > > control. > > Implicit magic is becoming harder to implement as I remove bs->blk, so I > think we're on the right way there. > > > > * Dirty bitmaps: > > > > > > We're currently trying, and if I'm not mistaken failing, to move dirty > > > bitmaps to the top. The (back then one) bitmap was first added to the > > > list in Paolo's commit a9fc4408, with the following commit message: > > > > > > While these should not be in use at the time a transaction is > > > started, a command in the prepare phase of a transaction might have > > > added them, so they need to be brought over. > > > > > > At that point, there was no transactionable command that did this in > > > the prepare phase. Today we have mirror and backup, but op blockers > > > should prevent them from being mixed with snapshots in a single > > > transaction, so I can't see how this change had any effect. > > > > > > The reason why I think we're failing to move dirty bitmaps to the top > > > today is that we're moving the head of the list to a different object > > > without updating the prev link in the first element, so in any case > > > it's buggy today. > > > > > > I really would like to keep bitmaps on the BDS where they are, but > > > unfortunately, we also have user-defined bitmaps by now, and if we > > > change whether they stick with the top level, that's a change that is > > > visible on the QMP interface. > > > > > > On the other hand, the QMP interface clearly describes bitmaps as > > > belonging to a node rather than a BB (you can use node-name, even with > > > no BB attached), so moving them could be considered a bug, even if > > > it is the existing behaviour. > > > > You just told us moving doesn't work. Did it ever work in any release > > that also provides the QMP interface in question? > > The feature was introduced in 2.4 (commit 341ebc2f) and I think I broke > it in time for 2.5 (the bdrv_swap() removal in dd62f1ca and the > following two patches). > > > If no, existing behavior doesn't matter :) > > > > If yes, the interface might be new enough to permit incompatible design > > flaw fixes. Paolo thinks bitmaps haven't been used widely. Discuss > > with their known users? > > Who knows the known users? John? (CCed) > > > > I can imagine use cases for both ways, so the interface that would > > > make the most sense to me is to generally keep BDSes at their node, > > > and to provide a QMP command to move them to a different one. > > > > Explicit control instead of implicit magic --- yes, please. > > > > > With compatibility in mind, this seems to be a reall tough one, > > > though. > > > > > > Any comments or ideas how to proceed with those two? > > > > Hope I could help a little. > > Yes, thanks. I hope my answers to your questions give you a clearer > picture, too. > > Kevin >